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Abstract. In this article we will retrace one of the great mathematical adven- 
tures of this century — the discovery of the soliton and the gradual explanation of 
its remarkable properties in terms of hidden symmetries. We will take an historical 
approach, starting with a famous numerical experiment carried out by Fermi, Pasta, 
and Ulam on one of the first electronic computers, and with Zabusky and KruskaFs 
insightful explanation of the surprising results of that experiment (and of a follow-up 
experiment of their own) in terms of a new concept they called "solitons" . Solitons 
however raised even more questions than they answered. In particular, the evolu- 
tion equations that govern solitons were found to be Hamiltonian and have infinitely 
many conserved quantities, pointing to the existence of many non-obvious symme- 
tries. We will cover next the elegant approach to solitons in terms of the Inverse 
Scattering Transform and Lax Pairs, and finally explain how those ideas led step-by- 
step to the discovery that Loop Groups, acting by "Dressing Transformations" , give 
a conceptually satisfying explanation of the secret soliton symmetries. 
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1. Introduction 

In the past several decades, two major themes have dominated developments in 
the theory of dynamical systems. On the one hand there has been a remarkable 
and rapid development in the theory of so-called "chaotic" systems, with a gradual 
clarification of the nature and origins of the surprising properties from which these 
systems get their name. Here what cries out to be explained is how a system 
that is deterministic can nevertheless exhibit behavior that appears erratic and 
unpredictable. 

In this article I will be discussing a second class of systems — equally puzzling, but 
for almost the opposite reason. For these so-called "integrable systems" , the chal- 
lenge is to explain the striking predictability, regularities, and quasi-periodicitics 
exhibited by their solutions, a behavior particularly apparent for a special class of 
solutions, called "solitons". The latter exhibit a "particle-like" behavior that gives 
them their name; for example they have geometric shapes that show a remarkable 
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degree of survivability under conditions that one might normally expect to destroy 
such features. 

Such conservation of geometric features is known to be intimately bound up with 
notions of symmetry — in fact, when suitably formalized, a famous theorem of E. 
Noether states that conserved quantities correspond to one-parameter groups of 
automorphisms of the dynamical system — and therein lies a puzzle. These systems 
do not have manifestly obvious symmetries to account for these anomalous con- 
servation laws, and to fully understand their surprising behavior we must search 
for the secret sources of their hidden symmetries. This article will be about that 
search, and about the many mathematical treasures it has so far revealed. 

A major problem for anyone attempting an exposition of "soliton mathematics" 
or "integrable systems" is the vast extent of its literature. The theory had its origins 
in the 1960's, and so can be considered relatively recent. But early research in the 
subject revealed mysterious new mathematical phenomena that quickly attracted 
the attention and stimulated the curiosity of many mathematicians throughout the 
world. As these researchers took up the intriguing challenge of understanding these 
new phenomena, an initial trickle of papers soon grew to a torrent, and the eventual 
working out of the details of the theory resulted from a concerted effort by hundreds 
of mathematicians whose results are spread over a still growing bibliography of many 
thousands of papers. 

Attempting to cover the subject in sufficient detail to mention all these contrib- 
utions — or even most of the important contributions — would require hundreds of 
pages. I have neither the time nor the expertise to undertake such a task, and 
instead I have tried to provide a guided tour through what I consider some of the 
major highlights of the subject. But the reader should realize that any attempt to 
compress such a massive subject in so few pages must be an exercise in selectivity 
that will in large measure reflect personal taste and biases of the author rather than 
some objective measure of importance. 

Another disclaimer: as we proceed I will try to present some of the remarkable 
story of how the subject began and developed. I say "story" rather than "history" 
because my report will be anecdotal in nature. I will try to be accurate, but I do 
not pretend to have done careful historical research. It is particularly important 
to keep in mind that during most of the development of the theory of integrable 
systems there was a very large and active group of mathematicians working on the 
subject in the former Soviet Union. Since communication of results between this 
group and the group of western mathematicians working in the field was slower 
than that within each group, even more than usual there were frequent cases in 
which similar advances were made nearly simultaneously in one group and the 
other. Statements made in this article to the effect that some person discovered a 
certain fact should not be interpreted as claiming that person had priority or sole 
priority in the discovery. 

There have been a number of fine volumes written that make a serious effort 
to encompass the bulk of soliton theory, giving careful historical and bibliographic 
references. I hope my abbreviated account will stimulate readers to consult these 
more complete sources, several of which are listed in the references ([AC], [FT], [N], 
[NMPZ]). 

The organization of this article will be in part historical. We will start with some 
surprising numerical experiments of Fermi-Pasta-Ulam and of Zabusky-Kruskal that 
were the origins of soliton theory. We will next consider the remarkable Inverse Scat- 
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tcring Transform and the related concept of Lax Pairs, first in the original context 
of the Korteweg-de Vries (KdV) equation, and then for the more general hierar- 
chies of integrable systems introduced by Zakharov and Shabat and by Ablowitz, 
Kaup, Newell, and Segur (ZS-AKNS). We will trace how developments that grew 
out of the ZS-AKNS approach eventually led to a synthesis that explains most of 
the phenomena of soliton theory from a unified viewpoint. In particular, it uncovers 
the source of the hidden symmetries of solitons, explaining both the existence of so 
many commuting constants of the motion and also the characteristic phenomenon 
of Backlund Transformations. This synthesis had its origins in the idea of "dressing 
transformations" , and in explaining it I will follow the recent approach of Chuu-lian 
Terng and Karen Uhlenbeck. I would like to express my sincere thanks to Chuu-lian 
for putting up with my countless requests that she interrupt her own work in order 
to explain to me some detail of this approach. Without these many hours of help, 
it would not have been possible for me to complete this article. 

This article is a revised version of notes from a series of Rudolf-Lipschitz Lectures 
that I delivered at Bonn University in January and February of 1997. I would like to 
thank the Mathcmatisches Institut of Universitat Bonn and its Sonderforschungs- 
bereich 256 for honoring me with the invitation to give that lecture series, and to 
thank the lively and high-level audience who, by their interest, stimulated me to 
write up my rough notes. 

My thanks to Bob Palais for pointing out a problem in my original discussion of 
split-stepping — and for helping me to re-write it 

And special thanks to barbara n beeton for an exceptional job of proof-reading. 
The many changes she suggested have substantially improved readability. 

2. Review of Classical Mechanics 

In this section we will review Classical Mechanics, in both the Lagrangian and 
Hamiltonian formulations. This is intended mainly to establish notational conven- 
tions, not as an exposition for novices. We shall also review the basic geometry of 
symplectic manifolds. 

1. Newton's Equations 

Let C be a Riemannian manifold ( "configuration space" ) and II : TC — > C its tangent 
bundle. A vector field X on TC is called a second order ODE on C if DH(X V ) = v 
for all v in TC. If 7 is a solution curve of X and a = 11(7) is its projection onto 
C then, by the chain rule, cr'(i) = DU(~ t '(t)) = DU(X l[t) ) = j(t), i.e., 7 is the 
velocity field of its projection. An easy argument shows conversely that if this is 
true for all solutions of a vector field X on TC then X is a second order ODE on 
C. For this reason we shall say that a smooth curve <r(t) in C satisfies the second 
order ODE X if a' is a solution curve of X. 

Given coordinates x\, . . . , x n for C in O, we define associated "canonical" coordi- 
nates gi, . . . , q n , qi, . . . , q n in n _1 (0) by = a^oll and — dxi. Let a : [a,b] — > C 
be a smooth curve in C, a' : [a, b] — > TC its velocity. If we define Xi(t) — Xi(a(t)) 
and q t (t) = qi (a'(t)) = Xi {t), then qi (t) := qi (a'{t)) = dx t (a'(t)) = ^ = ^1. 
It follows that a vector field X on C is a second order ODE if and only if in each 
canonical coordinate system it has the form X — ^^qid/dqi + F(qi,qi)d / deji), or 
equivalently the condition for a' to be a solution of X is that dqi{t)/dt = <ji(t), 
dqi(t)/dt = Fi(qi(t),qi(i)) (so d 2 Xi(t)/dt 2 — Fi(xi(t), dxi(t)/dt), explaining why it 
is called a second order ODE). 
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The classic example of a second order ODE on C is the vector field X generating 
the geodesic flow on TC — for each v in TC the solution curve of X with initial 
condition v is a' where a(t) — exp(iu) is the unique geodesic on C with ct'(O) = v. 
In local coordinates, Xi{a{t)) satisfy the system: 

d 2 Xj _ i dxj dx k 

~aW ~ jk[x) ~dT~dT 

(where the Tj k are the Christoffel symbols). What we shall call Newton's Equations 
(NE) for C is a second order ODE X u for C that is a slight generalization of the 
geodesic flow and is determined by a smooth real- valued function U on C called the 
potential energy function: 

#Xi_ _ dx^dxu _ dU 

{NE) dt* ~ l i k[X) dt dt dxi 

[Here is an intrinsic, geometric description of (NE). The gradient of U, VU is a 
vector field on C, and we call — VJ7 the force. If a(t) is any smooth curve in C, 
and v(t) is any tangent vector field along a (i.e., a lifting of a to TC), then the 
Levi-Civita connection allows us to covariantly differentiate v along a to produce 
another vector field Dv/dt along a. In particular, if for v we take the velocity field 
cr'(t), we can interpret Da' /dt as the acceleration of a, and the curve a satisfies 
Newton's Equations (for the potential U) if and only if Da' /dt — — VU.] 

2. The Lagrangian Viewpoint 

We define the kinetic energy function K on TC by K{v) = \ \\v\\ ', and we also 
consider the potential energy as a function on TC by U(v) = U(TL(v)). Their 
difference C = K — U is called the Lagrangian function on TC, and if a : [a,b] — > C 
is any smooth curve in C wc define its action A(a) = f £(a'(t))dt. In canonical 
coordinates as above, £(q, q) = \ J2ij QijQiQj ~U(q), so if we write Xi(t) = Xi(a(t)), 
then qi(a'(t)) — Xi(t), qi(a'(t)) = dxi/dt, and therefore 

/b fb 1 j j 

£(q(t),q(t))dt = -Y^ gi Mt))^^-U{x{t))dt. 

Let a e : [a, b] — > C be a smooth one-parameter family of curves defined for e 
near zero, and with <j = a. If we define 5a = (-j^) e=0 a e (a vector field along 
a) then it is easy to see that (-^) t=0 A(a e ) depends only on a and 5a, and we 
denote it by DA a {5a). Define qi(t,e) — qi(a' e (t)) = Xi(a e (t)), 5qi(t) = dqi(t,0)/de, 
qi(t,e) — qi(a' e (t)) and 5q\i{t) = dqi(t,0)/de. Then clearly <ji(t,e) = dqi(t,e)/dt, so, 
by equality of cross derivatives, 5q\i(t) = -^Sq^. 

It is now easy to compute DA a (5a). In fact, differentiating under the integral 
sign, using the chain rule, and integrating by parts gives: 



t V (— - — — 

J a , \dq t dt dq t 



5qi dt 
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The curve a is called a critical point of the action functional A if DA a (5a) 
vanishes for all variations 5a vanishing at the endpoints a and b, or equivalently 
if the Euler-Lagrange equations ^ — Jj§^ = are satisfied. Substituting in the 
expression for C(q, q) above, and recalling the definition of the Christoffel symbols, 
it is easy to check that a is a critical point of the action functional if and only if it 
satisfies Newton's Equations. 

It follows that if a is a solution of Newton's Equations then for any variation 5a, 
not necessarily vanishing at the endpoints, 

DAA5a) = [{a'{t),5a{t))] b a . 

As a first application, consider the variation of a defined by a e (t) — a(t + e). 
Clearly 5a(t) = a'(t) and A{a c ) = f a ^£(a')dt, so the definition of DA a (5a) 
gives DA a (5a) = \C{a' (t))] b a , while the above general formula for DA a (5a) when a 
satisfies (NE) gives DA a (5a) = [\\a' (t)\\ 2 } b a = [2K(a'(t))] b a . 

If we define the Hamiltonian or total energy function H on TC by H — 2K — C = 
2K - (K - U) = K + U, then it follows that [H(a')] b a = 0, or in other words H is 
constant along a 1 whenever a is a solution of Newton's Equations. Now a function 
F on TC that is constant along a' whenever a : [a,b] — > C satisfies (NE) is called a 
constant of the motion for Newton's Equations, so we have proved: 

Conservation of Energy Theorem. The Hamiltonian H = K + U is a constant 
of the motion for Newton's Equations. 

[Here is a more direct proof. K(a') = ^g(a',a'), where g is the metric tensor. By 
definition of the Levi-Civita connection, Dg/dt = 0, and (NE) says Da' /dt = — VU, 
so dK(a')/dt = g(- VU, a') = -dU/dt.} 

3. Noether's Principle 

A diffeomorphism 4> of C induces a diffcomorphism Dcf) of TC, and we call <j> a 
symmetry of Newton's Equations if D(j) preserves C, i.e., if CoDcf> = C. In particular, 
any isometry of C that preserves U is a symmetry of (NE). We note that if is a 
symmetry of (NE) and a is any smooth path in C then A(<fi o a) = A(a), and it 
follows that 4> permutes the critical points of A. Thus if a is a solution of (NE) then 
so is 4>oa. A vector field Y is called an infinitesimal symmetry of Newton's equations 
if it generates a one-parameter group of symmetries of Newton's equations, so in 
particular any Killing vector field that is tangent to the level surfaces of U is an 
infinitesimal symmetry of Newton's Equations. 

Suppose that Y is any vector field on C generating a one-parameter group of 
diffeomorphisms <f>t of C. We associate to Y a function Y on TC, called its conjugate 
momentum function, by Y(v) = (v, Y ). If a is any smooth path in C, then we can 
generate a variation of a defined by a t (t) — cp e (a(t)). Then by definition, 5a(t) = 
F CT ( t ) so, by the above general formula, if a is a solution of Newton's Equations then 
DA a (5a) = [Y(a'(t))]a Now suppose Y is an infinitesimal symmetry of Newton's 
Equations. Then since A(a e ) = A(<fi e o a) = A(a), DA a (5a) is zero by definition, 
hence [^(c'(f))]„ = 0, i.e., Y is constant along a'. This proves: 

E. Noether's Principle. The conjugate momentum of an infinitesimal symmetry 
is a constant of the motion. 

The conjugate momentum to the vector field d/dqi is denoted by Pf, Pi = 
dijQj — an d it follows from the non-degeneracy of the inner- product that 
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we can use qi, ■ ■ ■ ,q n ,Pi, . . . ,P n as coordinates in II 1 (0). The fact that New- 
ton's Equations are equivalent to the Euler-Lagrange equations says that in these 
coordinates Newton's Equations take the form: ^ = qi, = (i.e , X u — 

+ fiafr) )• Since £. ita = 2K,H = £. - £, so dff - Ei(*^ + 

Pid ^ ~ w dqi _ f§ d *) = qidPi ~ w^ dqu or in otner words ' if = and 

= 4i- Thus Newton's Equations take the very simple and symmetric form 
(called Hamilton's Equations) = = — Equivalently, the vector 

field X" has the form X" = £ - ff &-). 

4. The Hamiltonian Viewpoint 

So far we have looked at the dynamics of Newton's Equations on the tangent bundle 
TC of the configuration space. We will refer to this as the Lagrangian viewpoint. 
Since C is Ricmannian, there is a canonical bundle isomorphism L : TC — ► T*C of 
TC with the cotangent bundle, which in this setting is called the Legendre transfor- 
mation. Explicitly, L(v)(u) = (u,v). The Hamiltonian viewpoint towards particle 
mechanics consists in moving the dynamics over to T*C via the Legendre trans- 
formation. Remarkably, the transferred dynamics preserves the natural symplectic 
structure on T*C, and this fact is the basis for powerful tools for better analyzing 
the situation. The functions C o L^ 1 and H o L^ 1 are still called the Lagrangian 
and Hamiltonian function respectively and will still be denoted by L and H. By 
further such abuse of notation we will denote the vector field DL(X U ) on T*C by 
X u . 

Just as with the tangent bundle, coordinates xi, . . . , x n for C in O define natural 
coordinates qi, ■ ■ ■ , q n ,Pi, ■ ■ ■ ,Pn for the cotangent bundle in I1~ 1 0. Namely, qi — 
Xi o n as before, while the pi arc defined by Pi(£) = £{d/dxi). It is immediate from 
the definitions that qi o L = qi while pi o L = Pi, so it follows from the calculation 
above that the vector field X u (i.e., DL{X U )) on T*C describing the dynamics of 
Newton's Equations is X u = gf- - §§ 

There is a natural 1-form uj on T*C; namely if £ is a cotangent vector of C, then 
uoti = DH*(£), or in other words, for Y a tangent vector to T*C at I, u>e(Y) = 
£{DI\(Y)) , where n : T*C — > C is the bundle projection. (We note that oj does not 
involve the Ricmannian metric, and in fact is natural in the sense that if 4> is any 
diffeomorphism of C and <f> = (D<fi)* is the induced diffeomorphism of T*C then 
<f>*(w) = oj.) We define the natural 2-form Q, on T*C by £1 = du>, so f2 is exact and 
hence closed, i.e., (Kl = 0. 

It is then easy to check that u — J2i Pi dqi and hence ft = J2 i dpi A dqi . An 
immediate consequence of this is that fl is non-degenerate, i.e., the map v i— > i v Q 
is an isomorphism of the tangent bundle of T*C with its cotangent bundle. (Here 
i v Cl(u) = Q(v,u).) In fact, ifv = Y^iiAi-^ + Bt-^-) theni v Cl = J2i(Aidpi-Bidqi). 
In particular i xU fi = d Pi + fgdgj = dH. 

Any coordinates q±, . . . , q n ,p\, . . . ,p n for T*C are called "canonical coordinates" 
provided = £ i dpi A dqi. It follows that the "equations of motion" for solutions 
of Newton's Equations take the Hamiltonian form: ^ = -|£ , ^ = for 
any such coordinates. If H happens not to involve a particular qi explicitly, i.e., 
if H is invariant under the one parameter group of translations qi ^ qi + e, then 
this qi is called a cyclic variable, and its "conjugate momentum" pi is clearly a 
constant of the motion since ^ = — ^ = 0. If we can find canonical coordinates 
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qi , . . . , q n , pi , . . . ,p n such that all of the qi are cyclic then we call these variables 
action-angle variables, and when such coordinates exist we say that the Hamil- 
tonian system is completely integrable. The solutions of a completely integrable 
system are very easy to describe in action-angle variables. Note that we have H = 
H(p\, . . . ,p n )- For each c in R™ we have a submanifold S c = {£ e T*C \ Pi(£) = Cj}, 
and since the pi are all constants of the motion, these are invariant submanifolds of 
the flow. Moreover these submanifolds foliate T*C, and on each of them qi, . . . ,q n 
are local coordinates. If we define u>i(c) = f^-(c), then on S c Hamilton's Equations 

reduce to -jjr = u>i(c), so on S c the coordinates qi(t) of a solution curve are given 
by qi(t) = <Zi(0) + uji(c)t. Frequently the surfaces S c are compact, in which case it 
is easy to show that each connected component must be an n-dimensional torus. 
Moreover in practice we can usually determine the qt to be the angular coordinates 
for the n circles whose product defines the torus structure — which helps explain the 
terminology action-angle variables. 

Later we will look in more detail at the problem of determining whether a Hamil- 
tonian system is completely integrable. 

5. Symplectic Manifolds 

The cotangent bundle of a manifold is the model for what is called a symplectic 
manifold. Namely, a symplectic manifold is a smooth manifold P together with 
a closed non-degenerate 2-form Q, on P. If F : P — > R is a smooth real-valued 
function on P then there is a uniquely determined vector field X on P such that 
ix^l = dF, and we call X the symplectic gradient of F and denote it by V s F. Thus 
we can state our observation above by saying that the vector field X u on T*C is 
the symplectic gradient of the Hamiltonian function: X u — W s H . 

By an important theorem of Darboux, ([Ar], Chapter 8) in the neighborhood of 
any point of P there exist "canonical coordinates" qi , . . . , q n , p\ , . . . , p n in which Vt 
has the form dpi A dqi , and in these coordinates V s H — ~ f^" ^7" ) ; or 

equivalently the solution curves of V s H satisfy Hamilton's equations — — 

dqj _ dH_ 
dt dpi ' 

Before considering Poisson brackets on symplectic manifolds, we first make a 
short digression to review Lie derivatives. Recall that if X is a smooth vector field 
on a smooth manifold M, generating a flow (f> t , and if T is any smooth tensor 
field on M, then the Lie derivative of T with respect to X is the tensor field 
C x T = f t \ t=0 (j)* t {T). If C X T = then we shall say that "X preserves T", for 
this is the necessary and sufficient condition that the flow <fi t preserve T, i.e., that 
4>t(T) = T for all t. There is a famous formula of Cartan for the Lie derivative 
operator C x restricted to differential forms, identifying it with the anti-commutator 
of the exterior derivative operator d and the interior product operator ix'- 

C x = di x + ixd. 

If 9 is a closed p-form this gives C x = d(ix9), so X preserves 9 if and only if 
the (p — l)-form ix9 is closed. In particular this demonstrates the important fact 
that a vector field X on a symplectic manifold P is symplectic (i.e., preserves the 
symplectic form, fi) if and only if ix& is a closed 1-form (and hence, at least locally, 
the differential of a smooth function). The well known identity L [x Y] — [C x ,Cy] 
implies that the space of symplectic vector fields on P is a Lie algebra, which we can 
think of as the Lie algebra of the group of symplectic diffeomorphisms of P. It is 
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an interesting and useful fact that the space of Hamiltonian vector fields on P, i.e., 
those for which ix$l is an exact form, dF, is not only a linear subspace, but is even 
a Lie subalgebra of the symplectic vector fields, and moreover the commutator 
subalgebra of the symplectic vector fields is included in the Hamiltonian vector 
fields. To demonstrate this we shall show that if ix& and iyfi are closed forms, 
then is x ,Y\Q is n °t only closed but even exact, and in fact it is the differential of 
the function fl(Y,X). First, using the fact that Lie derivation satisfies a Leibnitz 
formula with respect to any natural bilinear operation on tensors (so in particular 
with respect to the interior product), £ x (iyf2) = i. c y) + Thus, since 

C X Y = [X,Y] and C x ^ = 0, C x {i Y £l) = i [XY] £l* Finally, since d(i y fi) = 0, 
Cartan's formula for £ x (iyf2) gives i, x Y ,Q, = dix(iyQ) = d(Cl(Y,X)). 
Remark. It is possible to prove Cartan's Formula by an ugly, brute force calcula- 
tion of both sides, but there is also an elegant, no-sweat proof that I first learned 
from S. S. Chcrn (when I proudly showed him my version of the ugly proof). There 
is an important involutory automorphism lo uj of the algebra A of differential 
forms on a manifold. Namely, it is the identity on forms of even degree and is 
minus the identity on forms of odd degree. A linear map d : A — > A is called 
an anti- derivation if d(\u>) = d\ A lu + A A du>. It is of course well-known that 
the exterior derivative, d, is an anti-derivation (of degree +1) and an easy check 
shows that the interior product ix is an anti derivation (of degree —1). Moreover, 
the anti-commutator of two anti-derivations is clearly a derivation, so that L x and 
dix + ixd are both derivations of A, and hence to prove they are equal it suffices 
to check that they agree on a set of generators of A. But A is generated by forms 
of degree zero (i.e., functions) and the differentials of functions, and it is obvious 
that C x and dix + ixd agree on these. 

We shall also have to deal with symplectic structures on infinite dimensional 
manifolds. In this case we still require that ft is a closed form and we also still 
require that ft is weakly non-degenerate, meaning that for each point p of P, the 
map v i — ► i v Q of TP p to TP* is injective. In finite dimensions this of course 
implies that f2 is strongly non-degenerate — meaning that the latter map is in fact 
an isomorphism — but that is rarely the case in infinite dimensions, so we will not 
assume it. Thus, if F is a smooth function on P, it does not automatically follow 
that there is a symplectic gradient vector field V s F on P satisfying F) p , v) = 

dF p (v) for all v in TP p — this must be proved separately. However, if a symplectic 
gradient does exist, then weak non-degeneracy shows that it is unique. In the 
infinite dimensional setting we call a function F:P-tRa Hamiltonian function 
if it has a symplectic gradient, and vector fields of the form V s F will be called 
Hamiltonian vector fields. Obviously the space of Hamiltonian functions is linear, 
and in fact the formula d(FG) = FdG + GdF shows that it is even an algebra, and 
that V S (FG) = F%G + G%F. We shall call a vector field X on P symplectic if 
the l-form ix& is a closed but not necessarily exact, for as we have seen, this is 
the condition for the flow generated by X to preserve 0. 

Of course if P is a vector space the distinction between Hamiltonian and sym- 
plectic disappears: if ix^l is closed, then H{p) — Q tp (X tp ,p) dt defines a Hamil- 
tonian function with V s H = X. Moreover, in this case it is usually straightforward 
to check if ix^l is closed. Given u, v in P, consider them as constant vector fields 
on P, so that [u,v] = 0. Then the formula d6(u,v) — u(6(v j) — v(8(u)) — 0([u, v]) 
for the exterior derivative of a l-form shows that symmetry of \ t _ Q(X p+tu , v) in 
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u and v is necessary and sufficient for ix& to be closed (and hence exact). In case 
is a constant form (i.e., fl p (u,v) is independent of p) then jjj \ t _ Q(X p+tu , v) — 
£l((DX p )(u),v), where (DX) p (u) = \ t _ Q X p+tu is the differential of X at p. Since 
fi is skew-symmetric in u and v, this shows that if fi is constant then X is Hamil- 
tonian if and only if (DX) p is "skew-adjoint" with respect to f2. 

If two smooth real-valued functions Fi and F 2 on a symplectic manifold P are 
Hamiltonian, i.e., if they have symplectic gradients V s Fi and V S F 2 , then they 
determine a third function on P, called their Poisson bracket, defined by: 

{F u F 2 } = n(V s F 2 ,V s F 1 ). 

The formula i [x Y] Q = d(Q(Y,X)) shows that the Poisson bracket is also a Hamil- 
tonian function, and in fact 

V s {Fi,F 2 } = [V s F 1 ,V s F 2 }. 

What this formula says is that Hamiltonian functions F : P — > R arc not only 
a commutative and associative algebra under pointwise product, but also a Lie 
algebra under Poisson bracket, and F i— > V s F is a Lie algebra homomorphism of 
this Lie algebra onto the Lie algebra of Hamiltonian vector fields on P. In particular, 
we see that the Poisson bracket satisfies the Jacobi identity, 

{{F lt F 2 } , F 3 } + {{F 2 , F 3 } , F,} + {{F 3 ,F 2 } , F 2 } = 0, 

and the Leibnitz Rule V S (FG) = F V s G + G V s F gives: 

{Ft , F 2 F 3 } = {Fi ,F 2 }F 3 + F 2 {Ft ,F 3 }, 

which we will also call the Leibnitz Rule. 

Remark. A Poisson structure for a smooth manifold is defined to be a Lie algebra 
structure {F, G} on the algebra of smooth functions that satisfies the Leibnitz Rule. 

Since {F U F 2 } = fi(V s F 2 , V s f\) = dF 2 (V s F 1 )=V S Fi(F 2 ), we can interpret the 
Poisson bracket of i*\ and F 2 as the rate of change of F 2 along the solution curves 
of the vector field V s F± . If we are considering some fixed Hamiltonian system 
^ = V S J/ X onP, then we can write this as ^ = {H, F}, and we see that the 
vanishing of the Poisson bracket {H 7 F} is the necessary and sufficient condition 
for F to be a constant of the motion. By the Jacobi Identity, a corollary to this 
observation is that the Poisson Bracket of two constants of the motion is also a 
constant of the motion. And since {H, H} = 0, H itself is always a constant of the 
motion. (This is a proof of conservation of energy from the Hamiltonian point of 
view, and below we will also see how to prove Noether's Theorem in the Hamiltonian 
framework.) 

Since the Poisson bracket is skew-symmetric, {Fi,F 2 } is zero if and only if 
{F 2 ,Fi} is zero, and in this case we say that F\ and F 2 are in involution. More 
generally k Hamiltonian functions F\ , . . . ,F^ are said to be in involution if all of 
the Poisson brackets {Fi, Fj} vanish. Note that since V s {F i7 Fj} = [V s F^, W s Fj], if 
the Fi are in involution then the vector fields V s Fi commute, i.e., [V a F i: V s Fj] = 0, 
or cquivalently the flows they generate commute. In particular we see that if 
Fi, . . . ,F n are in involution and if each \7 S Fi generates a one parameter group of 
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diffeomorphisms <f>\ of P then (ii, . . . , t n ) i— > ^ o^>| 2 o . . .o 0™^ defines a symplectic 
action of the abelian group R" on P. 

Suppose P is a symplectic manifold of dimension 2n and that there exist n func- 
tions Fi such that the dFi are everywhere linearly independent. If the functions Fi 
are in involution with each other and with a function H, then the so-called Arnold- 
Liouville Theorem ([Ar], Chapter 10) states that the Hamiltonian system V s H is 
completely integrable in the sense mentioned earlier, i.e., there exist action-angle 
variables qi, . . . , q n ,Pi, ■ ■ ■ ,Pn ■ In fact, complete intcgrability of a 2n dimensional 
Hamiltonian system is often defined as the existence of n functionally independent 
constants of the motion in involution. 

This leads naturally to two interesting problems: finding ways to construct sym- 
plectic manifolds with lots functions in involution, and determining whether a given 
Hamiltonian system is completely integrable. In the late 1970's M. Adlcr [Ad], 
B. Kostant [Kos], and W. Symes [Sy] independently and nearly simultaneously 
found a beautiful approach to the first question using certain special splittings 
of Lie algebras. For excellent surveys of finite dimensional completely integrable 
systems see [AdM] and [Pe]. The Adler-Kostant-Symes Theorem is explained in 
detail in both of these references, and we shall not discuss it further here, except 
to note that it is closely related to an earlier method of Peter Lax [Lai], that will 
be one of our main tools in later sections, and that, as Adler's paper showed, the 
Adler-Kostant-Symes Theorem also applies to infinite dimensional systems. In fact 
Adler's paper applied the method to the KdV equation, and later many other PDE 
were treated by the A-K-S approach in [Dr], [DS], [RS], [Sel], [Se2], and [Te2]. 

As for the second problem, there is no magic test to check if a given system is 
completely integrable, and the principal technique is to try to show that it can be 
manufactured using the Adler-Kostant-Symes method. In fact, one often hears it 
said that "all known completely integrable systems arise in this way" . 

If a symplectic structure fi is "exact" — i.e., if fi = du for some 1-form oj on P (as 
we saw was the case for a cotangent bundle) and if a vector field X not only preserves 
fi, but even preserves u>, then Cartan's formula gives = £ x u> = dix^J + ix^l, so if 
we define A" = — ix^> = —u>(X), then V S (X U ) = X. If Y is a second such vector 
field on P, then a computation completely analogous to that for i, XY Sl above 
(replacing fi by u>) gives [A,Y"] W = u)([Y, X]) = i [Y , X ]U = iyd{i x u) = -dX w (Y) = 
-o!A"(V s Y u ) = {X u , Y^}. Thus X >-> X u is a Lie algebra homomorphism inverse 
to F i— > YZ. F from the Lie algebra of vector fields preserving ui to the Lie algebra 
of Hamiltonian functions under Poisson bracket. 

In particular going back to Newton's Equations on our configuration space C, 
we see that if A is a Killing vector field on C such that XU = then u>(X) is a 
constant of the motion for Newton's Equations. It is easy to see that uj(X) is just 
the conjugate momentum of A, so this gives a proof of Noether's Principle in the 
Hamiltonian framework. 

6. Examples of Classical Mechanical Systems 

While any choice of potential function U on any Riemannian manifold C defines 
a "Classical Mechanical System", in some generalized sense, this name is often 
reserved for certain more special cases that arise from physical considerations. 

One important and interesting class of examples describes the motion of rigid 
bodies or "tops" with no external forces acting. Here the configuration space C is 
the rotation group SO (3), while the metric tensor (also called the Inertia tensor in 
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this case) is any left-invariant metric on C, and U = 0. We refer the reader to any 
book on Classical Mechanics (e.g., [AbM], [Ar]) for a discussion of these example, 
but be warned, that the full theory is covered in a multi- volume treatise [KS] . An 
excellent recent book is [Au]. 

A second important class of examples, usually referred to as "particle mechanics" , 
describes the motion under mutual forces of N particles in the Euclidean space R fe 
(where usually k = 1,2, or 3). In this case C = (R k ) N , a point x = (xi, . . . ,Xn) 
of C representing the positions of N particles. For an important subclass, the force 
on each particle is the sum of forces exerted on it by the remaining particles. In 
this case the potential U is a function of the distances = \\xi — Xj\\ separating 
the particles. It follows that the Lie group G of Euclidean motions of R fe is a 
group of symmetries, so the conjugate momenta of the Lie algebra of G give k 
linear momenta (from the translations) and k(k — l)/2 angular momentum (from 
the rotations) that are conserved quantities. 

A simple but important example from particle mechanics is the "harmonic os- 
cillator". Here k = N = 1, so C = R, the metric on TC = R x R is given by 
||(x,v)|| 2 = mv 2 (where m is the mass of the oscillator) and U(x) — \kx 2 , where 
k > is the so-called spring constant of the oscillator. This models a particle that is 
in equilibrium at the origin, but which experiences a Hooke's Law linear "restoring 
force" of magnitude — kx directed towards the origin when it is at the point x in 
C. Newton's Equation of motion is mx = —kx, and the solutions are of the form 
x(t) = Acos(uj(t — 1 )), where the angular frequency lu is yjk/m. The Hamiltonian 
formulation of the harmonic oscillator is given in terms of canonical variables q = x 
and p — m(dx/dt) by H(q,p) — \{p 2 /m + kq 2 ). Note that P = \(p 2 + mkq 2 ) and 
Q = arctan(p/ q\/ mk) define action-angle variables for the harmonic oscillator. 

Only notationally more complicated is the case of N uncoupled harmonic oscil- 
lators, with masses mi,... ,m N and spring constant ki, . . . , k N . Now C = R N , 
the metric on TC = R w x R^ is given by ||(x, v)\\ 2 = rriiV 2 , and the potential 
function is U(x) — \^ i kiX 2 . Newton's Equations are = — kiXi with the 

solutions Xiit) — Ai coa(uii(t — t )), where u>i = y/ki/mi. The Hamiltonian for this 
example is H(q,p) = J2i \ {Pil m i + Note that not only is the total Hamil- 

tonian, H, a constant of the motion, but so also are the N partial Hamiltonians, 
Hi(q,p) = \(p 2 /rrii + kq 2 ) — i.e., the sum of the kinetic plus potential energy of 
each individual oscillator is preserved during the time evolution of any solution. In 
this case we get one pair of action-angle variables from the action-angle variables 
for each of the individual harmonic oscillators, so it is again completely integrable. 

A seemingly more complicated example is the case of N coupled harmonic oscil- 
lators. Starting from the previous example, we imagine adding Hooke's Law springs 
with spring constants Kij joining the i-th and j-th particles. The force on the i-th 
particle is now Fi = —kiXi — K^(x,i — Xj), so we can take as our potential function 
U(x) = | Yl ki x i + \ J2ij K-ij{ x i — x j) 2 - Notice that this is clearly a positive def- 
inite quadratic form, so without loss of generality we can consider the somewhat 
more general potential function U(x) = \ kijXiXj, where kij is a positive defi- 
nite symmetric matrix. Newton's Equations arc now mix\ — — ^ . kijXj. Because 
of the off-diagonal elements of k^ (the so-called "coupling constants") Newton's 
Equations no longer have separated variables, and integrating them appears much 
more difficult. This is of course an illusion; all that is required to reduce this case 
to the case of uncoupled harmonic oscillators is to diagonalize the quadratic form 
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that gives the potential energy, i.e., find an orthonormal basis ej, . . . , e n such that if 
V = y\c-i + ■ ■ ■+ Vn&n then U(y) — \ J2 i Kyf- The solutions of Newton's Equations 
are now all of the form J2 i Ai cos(\/Ai(t, — t l ))ei. Solutions for which one Ai is 
non-zero and all the others are zero are referred as "normal modes" of the coupled 
harmonic oscillator system. Since the coupled harmonic oscillator system is just the 
uncoupled system in disguise, we see that it also is completely integrable. Moreover, 
when we express a solution x(t) of Newton's Equations as a sum of normal modes, 
then not only is the kinetic energy plus the potential energy of x{t) a constant of 
the motion, but also the kinetic plus the potential energy of each of these normal 
modes is also a constant of the motion. 

There are two properties of the coupled harmonic oscillators that make it an 
exceptionally important model system. First, it is exactly and explicitly solvable, 
and secondly, as we shall see in the next section, it is an excellent first approximation 
to what happens in an arbitrary system near a so-called "vacuum solution", i.e., a 
stable equilibrium. 

7. Physics Near Equilibrium 

Physical systems are normally close to equilibrium, so it is important to analyze 
well what happens in the phase space of a physical system in the near neighborhood 
of an equilibrium point. 

We shall assume that our system is described as above by a potential U on a 
configuration space C. By an "equilibrium point" we mean a point p of C that is 
not just a critical point of U, but in fact a non-degenerate local minimum. Since U 
is only determined up to an additive constant, we can assume that U(p) — 0. Since 
VE7 vanishes at p, it is clear that a(t) — p is a solution of Newton's Equations, and 
physicists sometimes refer to such a solution as a "vacuum solution" . 

By a famous result of Marston Morse, we can find local coordinates yi, ■ ■ ■ ,y n , 
in a neighborhood O of p and centered at p such that U(q) = Y^iViWf ' ■> so that 
N(e) = {qeO \ U(q) < e} is a neighborhood basis for p. It follows that a vacuum 
solution is stable, i.e., a solution of Newton's Equations with initial conditions 
sufficiently close to those of a vacuum solution will remain close to the vacuum 
solution for all time. To be precise, suppose j(t) is a solution of Newton's Equations 
such that 7 (0) is in N(±e) and ^'(O)) < ±e. Then C/( 7 (0)) + ^(Y(0)) < e, so 
that, by conservation of total energy, U{~f{t)) + K{^'{t)) < e for all t, and since K 
is non-negative, U("f(t)) < e for all t, i.e., the solution j(t) remains inside N(e). 

But we can be much more precise about the nature of these solutions that are 
near the vacuum. To simplify the exposition somewhat we will make the (inessen- 
tial) assumption that the metric on C is flat — as it usually is in particle mechanics. 
Then we can choose orthogonal coordinates xi, . . . ,x n centered at p that simulta- 
neously diagonalizes both the kinetic energy and the Hessian matrix of U at p, and 
the assumption that p is a non-degenerate local minimum just means that the di- 
agonal elements, ki, of the Hessian are positive. (The diagonal elements, m,, of the 
kinetic energy are of course also positive, and have the interpretations of masses) 
Thus, by Taylor's Theorem, U(x) = \ J2j fyx^ + \ J2jki a jki(x)xjXkXi, where the 
functions ajki (x) are smooth and symmetric in their last two indices, and Newton's 
Equations take the form: 
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(For later reference, we note that if we adopt a Hamiltonian viewpoint, and move 
to the cotangent bundle using the Legendre transform, then in the canonical sym- 
plectic coordinates associated to xi, . . . ,x n , the kinetic energy is K is given by 
K = \ J2 t ^t, the potential energy is U = \ £\ kjqj + ± J2 3 ki ">jkl(q)qjqkQl, and 
the Hamiltonian is H = K + U.) 

The system of uncoupled harmonic oscillators obtained by dropping the nonlinear 
terms is called the "linearized system" (at the given equilibrium p) and its normal 
modes are referred to by physicists as the "degrees of freedom" of the system. 

An obvious question is, "To what extent do solutions of the linearized system 
approximate those of the full system?". One answer is easy, and no surprise — 
Gronwal's Inequality implies that, as the initial position tends to p and the initial 
velocity tends to zero, a solution of the linearized equation approximates that of 
the full equation better and better, and for a longer period of time. 

A more subtle, but also more interesting question is, "How will the kinetic and 
potential energy of a solution become distributed, on average, among the vari- 
ous degrees of freedom of the full system?". It is not difficult to give a pre- 
cise formulation of this question. The kinetic energy in the i-th mode is clearly 
Ki = 57^-, and it is natural to assign to the i-th mode the potential energy 
Ui = ^hqf + \ ^2 kl a lk i(q)q i q k q l . Then Hi = K { + U l is that part of the total 
energy in the i-th mode, and the total energy H is just the sum of these Hi. We 
know that for the linearized system each of the Hi is a constant of the motion; that 
is, Hi is constant along any solution of Newton's Equations. But it is easy to see 
that cannot be true for the full system, and energy will in general flow between the 
normal modes because of the nonlinear coupling between them. The question is, 
will the "average behavior" of the Hi and Ki have some predictable relationship 
over large time intervals. 

To make the concept of "average" precise, given any function F : TC — ► R, define 
its "time average" , F along a given solution x(t) by: F = liniT^oo j; J^ T F(x(t)) dt. 
Then, what can we say about the time averages of the above partial energy func- 
tions and their relations to each other. Of course a first question is whether the 
limit defining the time average really exists, and this is already a non-trivial point. 
Fortunately, as we shall see in the next section, it is answered by the "Individual 
Ergodic Theorem" of G. D. Birkhoff, [Bi], according to which the time average will 
exist for "almost all" initial conditions. 

Starting in the late Nineteenth Century, physicists such as Maxwell, Boltzmann, 
and Gibbs developed a very sophisticated theory of statistical mechanics that gave 
convincing explanations for (and good predictions of) the behavior of large assem- 
blages of molecules. The theoretical foundations for this theory were based on just 
such time averages and their hypothesized equality with another kind of average 
that is easier to investigate, so-called "space averages", or "microcanonical aver- 
ages" . As we will see, the space average of the kinetic energy in each normal mode 
is the same — a fact referred to as "equipartition of energy" . This important fact is 
the very basis for the definition of temperature in statistical mechanics. Namely, 
for a system near equilibrium, if the absolute temperature is T, then the average 
kinetic energy in each degree of freedom is where k is the so-called Boltzmann 
constant. 

But it is the time averages of the kinetic energy that should really determine 
the temperature, and if energy equipartition holds for time averages, and if the 
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system is experimentally started in one of its normal modes and is then followed in 
time, one should see an equilibration take place, in which the kinetic energy should 
gradually flow out of the original single mode in which it was concentrated and 
become equally divided (on average) among all the various degrees of freedom of 
the system. Because of the above relation between temperature and equipartition 
of energy, this hypothesized equilibration process is referred to as "thcrmalization" . 
Intuitively speaking, this refers to the transformation of the large scale motion of 
the system in a single mode into "heat", i.e., lots of tiny fluctuating bits of energy 
of amount ^ in each of the many degrees of freedom. 

It should now be clear why physicists placed so much emphasis on proving the 
supposed equality of the time average and the microcanonical average, but math- 
ematically this proved to be a highly intractible problem. There were heuristic 
proofs, based on vague physical reasoning, and also semi-rigorous arguments based 
on so-called "ergodic hypotheses" . The latter were assumptions to the effect that 
the solution curves would wander on an energy surface in a sufficiently space filling 
way (ergodic comes from the Greek word for energy). Unfortunately these ergod- 
icity assumptions were vague and in certain cases topologically impossible, and it 
was only with the development of measure theory that von Neumann and Birkhoff 
were able to state the precise condition ("metric transitivity") under which one 
could prove that time and space averages must necessarily coincide. 

Nevertheless, physicists were morally convinced of the correctness of the time- 
average based concept of thermalization; so much so that when Fermi, Pasta, and 
Ulam undertook the numerical experiments that we will consider later, they stated 
that their goal was not so much to discover if there would be be thermalization, 
but rather to discover experimentally what the rate of approach to thermalization 
would be! 

For those readers who are interested, we will provide more of the mathematical 
details concerning equipartition of energy in the next section. 

8. Ergodicity and Thermalization 

Let P is a symplectic manifold (say of dimension 2n) with symplectic 2-form fi, 
and let H denote a Hamiltonian function on P, generating a symplectic flow <f) t , 
that is the infinitesimal generator of <p t is V s H, the symplectic gradient of H . As 
we have seen, this implies that the flow (j> t preserves the symplectic structure, and 
also that if is a "constant of the motion" , meaning that it is constant along every 
orbit, 4>t(p), or equivalently, that the constant energy hypersurfaces S c (defined 
by H = c) are invariant under the flow. In classical examples, the Hamiltonian is 
usually bounded below and proper (so that all the S c are compact) and we shall 
assume this in what follows. Since H is only defined up to an additive constant we 
can assume the minimum value of H is zero. 

The 2n-form fi™ defines a measure dfi on P (the Liouville measure), and this is 
of course invariant under the flow. We can factor Q n as fi™ = A AcLff , and the 2n — 1 
form A is uniquely determined modulo the ideal generated by dH, so it induces a 
unique measure on each energy hypcrsurface S c . We will denote these measures by 
da, and they are of course likewise invariant under the flow. Since S c is compact, 
its total measure, <r(c), is finite, and so, for any intcgrable function / on E c , we 
can define its spatial average by / = <j{c)~ l J s f(x)da{x). (This is the quantity 

called the "microcanonical average" of / in statistical mechanics.) We note that 
these measure da are canonically determined in terms of the Liouville form, Q n , 
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and the Hamiltonian function H, so if ip is any diffeomorphism of P that preserves 
fl n and preserves H, then ip will also preserve the da and hence all microcanonical 
averages, i.e., if g = f oip, then g = f. 

We return now to the question of "equipartition of energy" . We assume that we 
have canonical variables (pi, ... ,p n , qi, . . . , q n ) in P in which H takes the classical 

2 

form H = K + U where K — | ^ and U is a function of the qi with a non- 
degenerate local minimum, zero, at the origin. (It follows that for small c the energy 
surfaces S e are not only compact, but are in fact topologically spheres.) Since the 
p's and g's are canonical, has the standard Darboux form J2i dpiAdqi, and so the 
Liouville 2n-form is just dpi A dqi A ... A dp n A dq n , giving Lebesgue measure as the 

2 

Liouville measure in these coordinates. Our goal is to prove that if -FQ = then 

the microcanonical averages Ki, i = 1, . . . , n (over any fixed energy surface E c ) 
are all the same. Without loss of generality we can assume that i = 1 and j = 2, 
and by the remark above it will suffice to find a diffeomorphism ip that preserves 
H = K + U and the Liouville form such that K2 = K\ o ip. In fact, define 

i>(pi,Pz,P3, ■ ■ ■ ,Pn,Qi, ■ ■ ■ ,q n ) = (ap 2 ,a^ 1 p 1 ,p 3 , . . . ,p n ,qi, . . . ,q„), 

where a = \J mxjm^. Now, while ip is clearly not symplectic, it just as clearly does 
preserve the Liouville form. Moreover a trivial calculation shows that K2 = K\ o ip 
and K\ = K2 o ip, while K j = Ki o ip for i > 2. Since K = ^ i Ki, K = K o ip. 
Since U is a function of the g's and not the p's, U ~ U o ip, so H = H oip also, and 
this completes the proof that K2 — K\ . 

There is an important corollary of the above proof. Suppose that we can write the 
potential energy U as the sum of n functions Ui, and let us define Hi = Ki + Ui. 
You should think of Ui as representing the "potential energy in the z-th normal 
mode", and similarly Hi represents the part of the total energy that is "in" the 
i-th normal mode. In applications where the potential U describes an interaction 
between identical particles, these partial potentials will satisfy U\(q\,q2, ■ ■ ■ ,q n ) = 
^2(92, qi, ■ ■ ■ ,q n ), and similarly for other pairs of indices. (For the example of 
the preceding section, we note that these conditions will be satisfied if the "spring 
constants" ki are all equal and if the functions are symmetric in all three 
indices.) We remark that, in particular, these conditions are satisfied for the Fermi- 
Pasta-Ulam Lattice that we will consider shortly. If we now redefine ip above 
to simply interchange qi and qj, then the same argument as before shows that 
Ui = Uj, and so of course we also have Hi = Hj. In words, for such systems not 
only kinetic energy per mode, but also potential and total energies per mode are 
"equi-partitioned" , in the sense that their microcanonical averages are equal. 

Next recall that for p in E c we define the time average of / on the orbit of p by: 

f(p)= lim i f f(Mp))dt, 

provided the limit exists. G. D. Birkhoff's Individual Ergodic Theorem ([Bi]) states 
that f(p) is defined for almost all j) in S c , and then clearly / is invariant under 
the flow. It is moreover again an integrable function on S c with the same spatial 
average as / itself. It is then easily seen that the following four conditions are 
equivalent: 

1) For every integrable function / on S c , its time average / is constant (and 
hence equal to its spatial average). 
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2) Every measurable subset of S c that is invariant under the flow either has 
measure zero or has measure <r(c). 

3) If an integrable functions on S c is constant on each orbit of the flow then 
it is constant (almost everywhere) on S c . 

4) Given two subsets E\ and E 2 of T, c having positive measure, some translate 
4>t (Ei ) of Ei meets E 2 in a set of positive measure. 

and if these equivalent conditions are satisfied, then the flow is said to be ergodic 
or metrically transitive on S e . 

By choosing / to be the characteristic function of an open set O, we see from 
1) that ergodicity implies that the motion has a "stochastic" nature — that is, the 
fraction of time that an orbit spends in O is equal to the measure of O (so in 
particular almost all orbits are dense in S c ). This implies that (apart from S c 
itself) there cannot exist any stable fixed point, periodic orbit, or more general 
stable invariant set. To put it somewhat more informally, orbits on an ergodic S c 
cannot exhibit any simple asymptotic behavior. 

Note that any function of a constant of the motion will again be a constant 
of the motion — and in particular any function of H is a constant of the motion. 
There may of course be constants of the motion that are functionally independent 
of H. But if the flow is ergodic on every energy surface, then it follows from 3) 
that any constant of the motion, will be constant on each level set of H — which is 
just to say that it is a function of H. This shows that Hamiltonian systems with 
many independent constants of the motion (and in particular completely integrable 
systems) are in some sense at the opposite extreme from ergodic systems. 

So what is the status of the old belief that a "generic" (in some suitable sense) 
Hamiltonian system should be ergodic on each energy surface? On the one hand, 
Fermi [Fe] proved a result that points in this direction. And there is a famous 
result of Oxtoby and Ulam ([OU]) to the effect that in the set of all measure pre- 
serving homeomorphisms of an energy surface, those that are metrically transitive 
are generic in the sense of category. But the measure preserving diffeomorphisms of 
an energy surface are themselves only a set of first category in the measure preserv- 
ing homeomorphisms, so the Oxtoby-Ulam theorem is not particularly relevant to 
this question. In fact, the KAM (Kolmagorov-Arnold-Moser) Theorem ([Ar], Ap- 
pendix 8) shows that any Hamiltonian flow that is sufficiently close to a completely 
integrable system in a suitable C k topology will have a set of invariant tori of pos- 
itive Liouville measure, and so cannot be ergodic. Indeed, proving rigorously that 
any particular Hamiltonian system is ergodic is quite difficult. For some examples 
of such theorems see [AA] . 

3. Origins of Soliton Theory 

Perhaps the single most important event leading up to the explosive growth of 
soliton mathematics in the last decades were some seemingly innocuous numerical 
experiments, carried out by Enrico Fermi, John Pasta, and Stanislaw Ulam in 1954- 
55, on the Los Alamos MANIAC computer. (Originally published as Los Alamos 
Report LA1940 (1955) and reprinted in [FPU]). 

1. The Fermi-Pasta-Ulam Experiments 

The following quotation is taken from Stanislaw Ulam's autobiography, "Adven- 
tures of a Mathematician" . 
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Computers were brand new; in fact the Los Alamos Maniac was barely 
finished . . . .As soon as the machines were finished, Fermi, with his great 
common sense and intuition, recognized immediately their importance for the 
study of problems in theoretical physics, astrophysics, and classical physics. 
We discussed this at length and decided to formulate a problem simple to state, 
but such that a solution would require a lengthy computation which could not 
be done with pencil and paper or with existing mechanical computers . . . .[W]e 
found a typical one . . . the consideration of an elastic string with two fixed 
ends, subject not only to the the usual elastic force of stress proportional 
to strain, but having, in addition, a physically correct nonlinear term .... 
The question was to find out how . . . the entire motion would eventually 
thermalizc .... 

John Pasta, a recently arrived physicist, assisted us in the task of flow 
diagramming, programming, and running the problem on the Maniac .... 

The problem turned out to be felicitously chosen. The results were entirely 
different qualitatively from what even Fermi, with his great knowledge of wave 
motion had expected. 

What Fermi, Pasta, and Ulam (FPU) were trying to do was to verify numerically a 
basic article of faith of statistical mechanics; namely the belief that if a mechanical 
system has many degrees of freedom and is close to a stable equilibrium, then a 
generic nonlinear interaction will "thermalize" the energy of the system, i.e., cause 
the energy to become equidistributed among the normal modes of the corresponding 
linearized system. In fact, Fermi believed he had demonstrated this fact in [Fc]. 
Equipartition of energy among the normal modes is known to be closely related 
to the ergodic properties of such a system, and in fact FPU state their goal as 
follows: "The ergodic behavior of such systems was studied with the primary aim 
of establishing, experimentally, the rate of approach to the equipartition of energy 
among the various degrees of freedom of the system." 

FPU make it clear that the problem that they want to simulate is the vibrations 
of a "one-dimensional continuum" or "string" with fixed end-points and nonlinear 
elastic restoring forces, but that "for the purposes of numerical work this continuum 
is replaced by a finite number of points ... so that the PDE describing the motion 
of the string is replaced by a finite number of ODE" . To rephrase this in the 
current jargon, FPU study a one-dimensional lattice of N oscillators with nearest 
neighbor interactions and zero boundary conditions. (For their computations, FPU 
take N = 64.) 

We imagine the original string to be stretched along the x-axis from to its length 
I. The N oscillators have equilibrium positions p, = ih, i = 0, . . . , N — 1, where 
h = t/(N — 1) is the lattice spacing, so their positions at time t arc Xi(t) = Pi + 
Xi{t), (where the Xi represent the displacements of the oscillators from equilibrium). 
The force attracting any oscillator to one of its neighbors is taken as k(8 + aS 2 ), 
5 denoting the "strain", i.e., the deviation of the distance separating these two 
oscillators from their equilibrium separation h. (Note that when a = this is just 
a linear Hooke's law force with spring constant fc.) The force acting on the i-th 
oscillator due to its right neighbor is F(x)^ = k[(xi + i — Xi) + a((xi+\ — Xi) 2 ] while 
the force acting on the i-th oscillator due to its left neighbor is F(x)~ = k[{xi-\ — 
Xi) — a((xi-i — Xi) 2 ]. Thus the total force acting on the i-th oscillator will be the 
sum of these two forces, namely: F(x)i — k(x i+1 + Xi-i — 2a;,)[l + a(x i+ i — Xi-i)], 
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and assuming that all of the oscillators have the same mass, to, Newton's equations 
of motion read: 

mil = k(x l+1 + Xi-i - 2xi)[l + a(x i+1 - Xi-i)], 

with the boundary conditions x (t) = XN-i{t) = 0. In addition, FPU looked at 
motions of the lattice that start from rest, i.e., they assumed that ij(0) = 0, so the 
motion of the lattice is completely specified by giving the TV — 2 initial displacements 
Xi(0), i = 1, . . . , N— 2. We shall call this the FPU initial value problem (with initial 
condition Xi(0)). 

It will be convenient to rewrite Newton's equations in terms of parameters that 
refers more directly to the original string that we are trying to model. Namely, if 
p denotes the density of the string then m — ph, while if k denotes the Young's 
modulus for the string, (i.e., the spring constant for a piece of unit length) then 
k = n/h will be the spring constant for a piece of length h. Defining c = \fnfp we 
can now rewrite Newton's equations as: 

(fpu) xi = c 2 + y ~ 2a; ^ {1 + a(Xi+1 Xi _ l}] _ 

and in this form we shall refer to them as the FPU Lattice Equations. We can 
now "pass to the continuum limit", i.e., by letting N tend to infinity (so h tends 
to zero) we can attempt to derive a PDE for the function u(x, t) that measures the 
displacement at time t of the particle of string with equilibrium position x. We shall 
leave the nonlinear case for later, and here restrict our attention to the linear case, 
a = 0. If we take x = Pi, then by definition u(x, t) = Xi(t) and since pi + h = p i+ i 
while pi — h — Pi-i, with a = the latter form of Newton's equations gives: 

. . 9 u(x + h, t) + u(x — h, t) — 2u(x, t) 
u t t(x,t) = c z — . 

By Taylor's formula: 

f(x ±h) = f(x) ± hf'(x) + ^f"(x) ± ^f"'(x) + ^f""(x) + 0(h 5 ), 

and taking f(x) — u(x, t) this gives: 

u(x + h, t) + u(x — h, t) — 2u(x, t) , . /h 2 \ , . ^,,as 

-i ^ l fe2 ^ y ^=u xx {x,t)+[^)u xxxx {x,t) + 0(h i ), 

so letting h — > 0, we find u u = c 2 u XXl i.e., u satisfies the linear wave equation, with 
propagation speed c, (and of course the boundary conditions u(0,t) = u(£,t) = 0, 
and initial conditions u t (x, 0) = 0, u(x, 0) = u (xj). 

This is surely one of the most famous initial value problems of mathematical 
physics, and nearly every mathematician sees a derivation of both the d'Alembert 
and Fourier version of its solution early in their careers. For each positive integer 
k there is a normal mode or "standing wave" solution: 
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and the solution to the initial value problem is u(x, t) — Y^k=i a k u k{x, t) where the 
ak are the Fourier coefficients of u : 

2 f e . . . fklTX 

a k = y J u {x)s\n \ —- 
Replacing x by pj = jh in Uk(x, t) (and using I — (N — l)h) we get functions 

and it is natural to conjecture that these will be the normal modes for the FPU ini- 
tial value problem (with a = of course). This is easily checked using the addition 
formula for the sine function. It follows that, in the linearized case, the solution 
to the FPU initial value problem with initial conditions Xi(0) is given explicitly by 

x j(t) = J2k=i a k£j k \t)i where the Fourier coefficients a k are determined from the 
formula: 

N-2 

a k = ^ Xj (0) sin 

i=i 

Of course, when a is zero and the interactions are linear we are in effect dealing 
with N — 2 uncoupled harmonic oscillators (the above normal modes) and there is 
no thermalization. On the contrary, the sum of the kinetic and potential energy of 
each of the normal modes is a constant of the motion! 

But if a is small but non-zero, FPU expected (on the basis of then generally 
accepted statistical mechanics arguments) that the energy would gradually shift 
between modes so as to eventually roughly equalize the total of potential and kinetic 
energy in each of the N — 2 normal modes £( fe ) . To test this they started the lattice 
in the fundamental mode with various values of a, and integrated Newton's 
equations numerically for a long time interval, interrupting the evolution from time 
to time to compute the total of kinetic plus potential energy in each mode. What 
did they find? Here is a quotation from their report: 

Let us say here that the results of our computations show features which 
were, from the beginning, surprising to us. Instead of a gradual, continuous 
flow of energy from the first mode to the higher modes, all of the problems 
showed an entirely different behavior. Starting in one problem with a qua- 
dratic force and a pure sine wave as the initial position of the string, we did 
indeed observe initially a gradual increase of energy in the higher modes as 
predicted (e.g., by Raylcigh in an infinitesimal analysis). Mode 2 starts in- 
creasing first, followed by mode 3, and so on. Later on, however this gradual 
sharing of energy among the successive modes ceases. Instead, it is one or 
the other mode that predominates. For example, mode 2 decides, as it were, 
to increase rather rapidly at the cost of the others. At one time it has more 
energy than all the others put together! Then mode 3 undertakes this role. 
It is only the first few modes which exchange energy among themselves, and 
they do this in a rather regular fashion. Finally, at a later time, mode 1 comes 
back to within one percent of its initial value, so that the system seems to be 
almost periodic. 
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There is no question that Fermi, Pasta, and Ulam realized they had stumbled 
onto something big. In his autobiography [Ul], Ulam devotes several pages to a 
discussion of this collaboration. Here is a little of what he says: 

I know that Fermi considered this to be, as he said, "a minor discovery." 
And when he was invited a year later to give the Gibbs Lecture (a great 
honorary event at the annual American Mathematical Society meeting), he 
intended to talk about it. He became ill before the meeting, and his lecture 
never took place .... 

The results were truly amazing. There were many attempts to find the rea- 
sons for this periodic and regular behavior, which was to be the starting point 
of what is now a large literature on nonlinear vibrations. Martin Kruskal, a 
physicist in Princeton, and Norman Zabusky, a mathematician at Bell Labs 
wrote papers about it. Later, Peter Lax contributed signally to the theory. 

Unfortunately, Fermi died in 1955, even before the paper cited above was published. 
It was to have been the first in a series of papers, but with Fermi's passing it fell 
to others to follow up on the striking results of the Fermi-Pasta-Ulam experiments. 

The MANIAC computer, on which FPU carried out their remarkable research, 
was designed to carry out some computations needed for the design of the first 
hydrogen bombs, and of course it was a marvel for its day. But it is worth noting 
that it was very weak by today's standards — not just when compared with current 
supercomputers, but even when compared with modest desktop machines. At a 
conference held in 1977 Pasta recalled, "The program was of course punched on 
cards. A DO loop was executed by the operator feeding in the deck of cards over 
and over again until the loop was completed!" 

2. The Kruskal-Zabusky Experiments 

Following the FPU experiments, there were many attempts to explain the surprising 
quasi-periodicity of solutions of the FPU Lattice Equations. However it was not 
until ten years later that Martin Kruskal and Norman Zabusky took the crucial 
steps that led to an eventual understanding of this behavior [ZK]. 

In fact, they made two significant advances. First they demonstrated that, in a 
continuum limit, certain solutions of the FPU Lattice Equations could be described 
in terms of solutions of the so-called Korteweg-de Vries (or KdV) equation. And 
secondly, by investigating the initial value problem for the KdV equation numer- 
ically on a computer, they discovered that its solutions had remarkable behavior 
that was related to, but if anything even more surprising and unexpected than the 
anomalous behavior of the FPU lattice that they had set out to understand. 

Finding a good continuum limit for the nonlinear FPU lattice is a lot more 
sophisticated than one might at first expect after the easy time we had with the 
linear case. In fact the approach to the limit has to be handled with considerable 
skill to avoid inconsistent results, and it involves several non-obvious steps. 

Let us return to the FPU Lattice Equations 



and as before we let u(x,t) denote the function measuring the displacement at 
time t of the particle of string with equilibrium position x, so if x = pi then, by 
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22 



RICHARD S. PALAIS 



definition, Xi{t) — u(x, t), x i+ i(t) = u(x + h, t), and Xi-\(t) = u(x — h, t). Of course 
Xi = u u (x, t) and, as noted earlier, Taylor's Theorem with remainder gives 

x i+ i + Xi-i — 2x l u(x + h,t) + u(x — h,t) — 2u(x, t) 
h 2 = h 2 

= u xx (x, t) + (^ju xxxx {x, t) + 0(h 4 ). 

By a similar computation 

a(x i+1 - Xi-i) = (2ah)u x (x,t)+(^-^u xxx (x,t) +0(h 5 ), 
so substitution in (FPU) gives 

(^j)wtt ~ u xx = {2ah)u x u xx +[^ju xxxx + 0(h 4 ). 

As a first attempt to derive a continuum description for the FPU lattice in the 
nonlinear case, it is tempting to just let h approach zero and assume that 2ah 
converges to a limit e. This would give the PDE 

u tt = c 2 (l + eu x )u xx 

as our continuum limit for the FPU Lattice equations and the nonlinear generaliza- 
tion of the wave equation. But this leads to a serious problem. This equation is fa- 
miliar in applied mathematics — it was studied by Rayleigh in the last century — and 
it is easy to see from examples that its solutions develop discontinuities (shocks) 
after a time on the order of (ec) _1 , which is considerably shorter than the time 
scale of the almost periods observed in the Fermi-Pasta-Ulam experiments. It was 
Zabusky who realized that the correct approach was to retain the term of order h 2 
and study the equation 

(ZK) {~i) Utt ~ Uxx = ( 2ah ) u x u xx+(^)u xxxx . 

If we differentiate this equation with respect to x and make the substitution v = u x , 
we see that it reduces to the more familiar Boussinesq equation 



/ 1 \ d(v) 



(The effect of the fourth order term is to add dispersion to the equation, and this 
smoothes out incipient shocks before they can develop.) 

It is important to realize that, since h ^ 0, (ZK) cannot logically be considered 
a true continuum limit of the FPU lattice. It should rather be regarded as an 
asymptotic approximation to the lattice model that works for small lattice spacing 
h (and hence large N). Nevertheless, we shall now see how to pass from (ZK) to a 
true continuum description of the FPU lattice. 

The next step is to notice that, with a and h small, solutions of (ZK) should 
behave qualitatively like solutions of the linear wave equation u u = c 2 u xx , and 
increasingly so as a and h tend to zero. Now the general solution of the linear wave 
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equation is of course u(x, t) = f(x + ct) + g(x — ct), i.e., the sum of an arbitrary left 
moving traveling wave and an arbitrary right moving traveling wave, both moving 
with speed c. Recall that it is customary to simplify the analysis in the linear case 
by treating each kind of wave separately, and we would like to do the same here. 
That is, we would like to look for solutions u(x,t) that behave more and more like 
(say) right moving traveling waves of velocity c — and for longer and longer periods 
of time — as a and h tend to zero. 

It is not difficult to make precise sense out of this requirement. Suppose that 
y(£, t) is a smooth function of two real variables such that the map r y(-, r) is 
uniformly continuous from R into the bounded functions on R with the sup norm — 
i.e., given e > there is a positive 5 such that |r— r | < S implies |y(£, r)— t )| < 
e. Then for \t — t \ < T — 5/(ahc) we have \ahct — ahct \ < 6, so \y(x — ct, ahct) — 
y(x — ct, ahcto)\ < e. In other words, the function u(x,t) = y(x — ct,ahct) is 
uniformly approximated by the traveling wave u°(x,t) — y(x — ct,ahct ) on the 
interval \t — to| < T (and of course T — > oo as a and h tend to zero). To restate this 
a little more picturesquely, u{x,i) = y{x — ct,ahct) is approximately a traveling 
wave whose shape gradually changes in time. Notice that if y(£, r) is periodic or 
almost periodic in t, the gradually changing shape of the approximate traveling 
wave will also be periodic or almost periodic. 

To apply this observation, we define new variables £ = x — ct and r = (ah)ct. 
Then by the chain rule, d k /dx k = d k /d£ k , d/dt = -c(d/d£ - {aWjd/dr), and 
d 2 /dt 2 = c 2 (d 2 /de - (2ah)d 2 1 didr) + (ah) 2 d 2 /dr 2 ). 
Thus in these new coordinates the wave operator transforms to: 

c 2 dt 2 dx 2 - ah dtdT + { > dr 21 

so substituting u(x,t) — y(£,r) in (ZK) (and dividing by —2ah) gives: 

%r-(^)»rr = - W »«-(^)»««, 

and, at last, we are prepared to pass to the continuum limit. We assume that a 
and h tend to zero at the same rate, i.e., that as h tends to zero, the quotient h/a 
tends to a positive limit, and we define 5 = lim^o y/h/(24a). Then ah = 0(h 2 ), 
so letting h approach zero gives y^ T + y^y^ + S 2 y^^ = 0. Finally, making the 
substitution v = y^ we arrive at the KdV equation: 

(KdV) v T + tw e + S 2 v^ = 0. 

Remark. Note that if we re-scale the independent variables by r — > fir and £ — > 7^, 
then the KdV equation becomes: 

so by appropriate choice of (3 and 7 we can obtain any equation of the form v T + 
\vv£ + fJ-v^ = 0, and any such equation is referred to as "the KdV equation". A 
commonly used choice that is convenient for many purposes is v T +6vv^ +v^£ = 0, 
although the form v T — 6vv^ + = (obtained by replacing v by —v) is equally 
common. We will use both these forms. 
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Let us recapitulate the relationship between the FPU Lattice and the KdV 
equation. Given a solution Xi(t) of the FPU Lattice we get a function u(x,t) 
by interpolation — i.e., u(ih,t) — Xi(t), i = 0,...,N. For small lattice spacing 
h and nonlincarity parameter a there will be solutions Xi(t) so that the corre- 
sponding u(x,t) will be an approximate right moving traveling wave with slowly 
varying shape, i.e., it will be of the form u(x, t) = y(x — ct, othct) for some smooth 
function y(£,,r), and the function v(£,t) = %(£,t) will satisfy the KdV equation 
v T + W£ + S 2 v^ = 0, where S 2 = h/(24a). 

Having found this relationship between the FPU Lattice and the KdV equation, 
Kruskal and Zabusky made some numerical experiments, solving the KdV initial 
value problem for various initial data. Before discussing the remarkable results that 
came out of these experiments, it will be helpful to recall some of the early history 
of this equation. 

3. A First Look at KdV 

Korteweg and de Vries derived their equation in 1895 to settle a debate that had 
been going on since 1844, when the naturalist and naval architect John Scott Rus- 
sell, in an oft-quoted paper [Ru], reported an experience a decade earlier in which 
he followed the bow wave of a barge that had suddenly stopped in a canal. This 
"solitary wave" , some thirty feet long and a foot high, moved along the channel at 
about eight miles per hour, maintaining its shape and speed for over a mile as Rus- 
sell raced after it on horseback. Russell became fascinated with this phenomenon, 
and made extensive further experiments with such waves in a wave tank of his own 
devising, eventually deriving a (correct) formula for their speed as a function of 
height. The mathematicians Airy and Stokes made calculations which appeared to 
show that any such wave would be unstable and not persist for as long as Russell 
claimed. However, later work by Boussinesq (1872), Rayleigh (1876) and finally the 
Korteweg-de Vries paper in 1895 [KdV] pointed out errors in the analysis of Airy 
and Stokes and vindicated Russell's conclusions. 

The KdV equation is now accepted as controlling the dynamics of waves moving 
to the right in a shallow channel. Of course, Korteweg and de Vries did the obvious 
and looked for traveling-wave solutions for their equation by making the Ansatz 
v(x,t) = f(x — ct). When this is substituted in the standard form of the KdV 
equation it gives —cf + 6//' + /"' = 0. If we add the boundary conditions that / 
should vanish at infinity, then a fairly routine analysis leads to the one parameter 
family of traveling wave solutions v(x,t) — 2a 2 sech 2 (a(x — 4a 2 i)), now referred 
to as the one-soliton solutions of KdV. (These are of course the solitary waves of 
Russell.) Note that the amplitude 2a 2 is exactly half the speed 4a 2 , so that taller 
waves move faster than their shorter brethren. 

Now, back to Zabusky and Kruskal. For numerical reasons, they chose to deal 
with the case of periodic boundary conditions — in effect studying the KdV equation 
u t + uu x + 5 2 u xxx = (which they label (1) ) on the circle instead of on the line. 
For their published report, they chose S = 0.022 and used the initial condition 
u(x,0) = cos(nx). Here is an extract from their report (containing the first use of 
the term "soliton") in which they describe their observations: 

(I) Initially the first two terms of Eq. (1) dominate and the classical overtak- 
ing phenomenon occurs; that is u steepens in regions where it has negative 
slope. (II) Second, after u has steepened sufficiently, the third term becomes 
important and serves to prevent the formation of a discontinuity. Instead, 
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oscillations of small wavelength (of order S) develop on the left of the front. 
The amplitudes of the oscillations grow, and finally each oscillation achieves 
an almost steady amplitude (that increases linearly from left to right) and 
has the shape of an individual solitary- wave of (1). (Ill) Finally, each "soli- 
tary wave pulse" or soliton begins to move uniformly at a rate (relative to 
the background value of u from which the pulse rises) which is linearly pro- 
portional to its amplitude. Thus, the solitons spread apart. Because of the 
periodicity, two or more solitons eventually overlap spatially and interact non- 
linearly. Shortly after the interaction they reappear virtually unaffected in size 
or shape. In other words, solitons "pass through" one another without losing 
their identity. Here we have a nonlinear physical process in which interacting 
localized pulses do not scatter irreversibly. 

(If you are not sure what Zabusky and Kruskal mean here by "the classical over- 
taking phenomenon", it will be explained in the next section.) 

Zabusky and Kruskal go on to describe a second interesting observation, a re- 
currence property of the solitons that goes a long way towards accounting for the 
surprising recurrence observed in the FPU Lattice. Let us explain again, but in 
somewhat different terms, the reason why the recurrence in the FPU Lattice is so 
surprising. The lattice is made up of a great many identical oscillators. Initially 
the relative phases of these oscillators are highly correlated by the imposed cosine 
initial condition. If the interactions are linear (a = 0), then the oscillators are 
harmonic and their relative phases remain constant. But, when a is positive, the 
anharmonic forces between the oscillators cause their phases to start drifting rela- 
tive to each other in an apparently uncorrelated manner. The expected time before 
the phases of all of the oscillators will be simultaneously close to their initial phases 
is enormous, and increases rapidly with the total number N. But, from the point 
of view of the KdV solitons, an entirely different picture appears. As mentioned in 
the above paragraph, if 5 is put equal to zero in the KdV equation, it reduces to 
the so-called inviscid Burgers' Equation, which exhibits steepening and breaking of 
a negatively sloped wave front in a finite time Tg. (For the above initial conditions, 
the breaking time, 7b, can be computed theoretically to be However, when 

S > 0, just before breaking would occur, a small number of solitons emerge (eight in 
the case of the above initial wave shape, cos(7ra;)) and this number depends only on 
the initial wave shape, not on the number of oscillators. The expected time for their 
respective centers of gravity to all eventually "focus" at approximately the same 
point of the circle is of course much smaller than the expected time for the much 
larger number of oscillators to all return approximately to their original phases. In 
fact, the recurrence time Tr for the solitons turns out to be approximately equal to 
30.47b, and at this time the wave shape u(x, Tr) is uniformly very close to the ini- 
tial wave form u(x, 0) = cos(7r:r). There is a second (somewhat weaker) focusing at 
time t = 2Tr, etc. (Note that these times are measured in units of the "slow time" , 
t, at which the shape of the FPU traveling wave evolves, not in the "fast time" , 
t, at which the traveling wave moves.) In effect, the KdV solitons are providing a 
hidden correlation between the relative phases of the FPU oscillators! 

Notice that, as Zabusky and Kruskal emphasize, it is the persistence or shape 
conservation of the solitons that provides the explanation of recurrence. If the 
shapes of the solitons were not preserved when they interacted, there would be no 
way for them to all get back together and approximately reconstitute the initial 
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condition at some later time. Here in their own words is how they bring in solitons 
to account for the fact that thermalization was not observed in the FPU experiment: 
Furthermore, because the solitons are remarkably stable entities, preserv- 
ing their identities throughout numerous interactions, one would expect this 
system to exhibit thermalization (complete energy sharing among the corre- 
sponding linear normal modes) only after extremely long times, if ever. 
But this explanation, elegant as it may be, only pushes the basic question back 
a step. A full understanding of FPU recurrence requires that we comprehend the 
reasons behind the remarkable new phenomenon of solitonic behavior, and in par- 
ticular why solitons preserve their shape. In fact, it was quickly recognized that the 
soliton was itself a vital new feature of nonlinear dynamics, so that understanding 
it better and discovering other nonlinear wave equations that had soliton solutions 
became a primary focus for research in both pure and applied mathematics. The 
mystery of the FPU Lattice recurrence soon came to be regarded as an important 
but fortuitous spark that ignited this larger effort. 

The next few short sections explain some elementary but important facts about 
one-dimensional wave equations. If you know about shock development, and how 
dispersion smooths shocks, you can skip these sections without loss of continuity. 

4. "Steepening" and "Breaking" 

Several times already we have referred to the phenomenon of "steepening and break- 
ing of negatively sloped wave- fronts" for certain wave equations. If you have never 
seen this explained it probably sounds suggestive but also a little mysterious. In 
fact something very simple is going on that we will now explain. 

Let us start with the most elementary of all one-dimensional wave equations, 
the linear advection equation (or forward wave equation), u t + cu x = 0. If we 
think of the graph of x ^ u(x, t) as representing the profile of a wave at time t, 
then this equation describes a special evolutionary behavior of the wave profile in 
time. In fact, if u (x) = u(x, 0) is the "initial" shape of the wave, then the unique 
solution of the equation with this initial condition is the so-called "traveling wave" 
u(x, t) = uq(x — ct), i.e., just the initial wave profile translating rigidly to the right 
at a uniform velocity c. In other words, we can construct the wave profile at time 
t by translating each point on the graph of Uq(x) horizontally by an amount ct. As 
we shall now see, this has a remarkable generalization. 

We shall be interested in the non- viscous Burgers' equation, u t + uu x = 0, but it 
is just as easy to treat the more general equation u t + f{u)u x = 0, where / : R — > R 
is some smooth function. Let me call this simply the nonlinear advection equation 
or NLA. 

Proposition. Let u(x,i) be a smooth solution of the nonlinear advection equation 
u t + f{u)u x = for x e R and t G [0, to]; an d with initial condition uq{x) = u(x, 0). 
Then for t < to the graph of x ^ u(x, t) can be constructed from the graph of uq by 
translating each point (x,uq(x)) horizontally by an amount f(uo(x))t. 

Proof. The proof is by the "method of characteristics", i.e., we look for curves 
(x(s),t(s)) along which u{x,t) must be a constant (say c), because u satisfies 
NLA. If we differentiate u(x(s), t(s)) — c with respect to s, then the chain rule 
gives u x (x(s), t(s))x'(s) + u t (x(s),t(s))t' (s) = 0, and hence dx/dt = x'(s)/t'(s) = 
—u t (x(s),t(s))/u x (x(s),t(s)), and now substitution from NLA gives: 

dx/dt = f(u(x(s),t(s)))=f(c), 



THE SYMMETRIES OF SOLITONS 



27 



so the characteristic curves are straight lines, whose slope is /(c), where c is the 
constant value the solution u has along that line. In particular, if we take the 
straight line with slope f(uo(x)) starting from the point (x, 0), then u(x, t) will have 
the constant value uo(x) along this line, a fact that is equivalent to the conclusion 
of the Proposition. ■ 

It is now easy to explain steepening and breaking. We assume that the function 
/ is monotonically increasing and that uq(x) has negative slope (i.e., is strictly 
decreasing) on some interval I. If we follow the part of the wave profile that 
is initially over the interval /, we see from the Proposition that the higher part 
(to the left) will move faster than the lower part (to the right), and so gradually 
overtake it. The result is that the wave "bunches up" and its slope increases — this 
is steepening — and eventually there will be a first time Tb when the graph has a 
vertical tangent — this is breaking. Clearly the solution cannot be continued past 
t = Tb, since for t > Tb the Proposition would give a multi- valued graph for u(x, t). 
It is an easy exercise to show that the breaking time Tb is given by | min(uQ(a;))| . 

This explains the first part of the above quotation from Zabusky and Kruskal, 
namely, "Initially the first two terms of Eq. (1) dominate and the classical overtak- 
ing phenomenon occurs; that is u steepens in regions where it has negative slope." 
But what about their next comment: "Second, after u has steepened sufficiently, 
the third term becomes important and serves to prevent the formation of a discon- 
tinuity"? To explain this we have to take up the matter of dispersion. 

5. Dispersion 

Let us next consider linear wave equations of the form u t + P {-§^) it = 0, where 
P is a polynomial. Recall that a solution u{x,t) of the form e l ( fe:E ~ wt ) j s called a 
plane-wave solution; k is called the wave number (waves per unit length) and lu 
the (angular) frequency. Rewriting this in the form e }K x ~( u / k ) t ) ^ we recognize that 
this is a traveling wave of velocity f. If we substitute this u(x,t) into our wave 
equation, we get a formula determining a unique frequency ui(k) associated to any 
wave number k, which we can write in the form = j^P(ik). This is called the 
"dispersion relation" for this wave equation. Note that it expresses the velocity 
for the plane-wave solution with wave number k. For example, P {-§^) = c-^ 
gives the linear advection equation ut + cu x = 0, which has the dispersion relation 
= c, showing of course that all plane-wave solutions travel at the same velocity 
c, and we say that we have trivial dispersion in this case. On the other hand, if 
we take P (^) = (Jjr) , then our wave equation is u t + u xxx = 0, which is the 
KdV equation without its nonlinear term, and we have the non-trivial dispersion 
relation = — k 2 . In this case, plane waves of large wave- number (and hence 
high frequency) are traveling much faster than low- frequency waves. The effect of 
this is to "broaden a wave-packet". That is, suppose our initial condition is uq(x). 
We can use the Fourier Transform to write u in the form Uo(x) = J Uo(k)e lhx dk, 
and then, by superposition, the solution to our wave equation will be u(x,t) — 
J uo{k)e lk ^ x ~^ w ^ k ^ k ^ dk. Suppose for example our initial wave form is a highly 
peaked Gaussian. Then in the case of the linear advection equation all the Fourier 
modes travel together at the same speed and the Gaussian lump remains highly 
peaked over time. On the other hand, for the linearized KdV equation the various 
Fourier modes all travel at different velocities, so after a short time they start 
cancelling each other by destructive interference, and the originally sharp Gaussian 
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quickly broadens. This is what Zabusky and Kruskal are referring to when they say 
that "... the third term becomes important and serves to prevent the formation 
of a discontinuity." Just before breaking or shock-formation, the broadening effects 
of dispersion start to cancel the peaking effects of steepening. Indeed, careful 
analysis shows that in some sense, what gives KdV solitons their special properties 
of stability and longevity is a fine balance between the yin effects of dispersion and 
the yang effects of steepening. 

6. Split-stepping KdV 

There is an interesting question that is suggested by our analysis in the last 
two sections. In the KdV equation, u t = —6uu x — u xxx , if we drop the nonlinear 
term, we have a constant coefficient linear PDE whose initial value problem can 
be solved explicitly by the Fourier Transform. On the other hand, if we ignore 
the linear third-order term, then we are left with the inviscid Burgers' equation, 
whose initial value problem can be solved numerically by a variety of methods. (It 
can also be solved in implicit form analytically, for short times, by the method of 
characteristics, 

u = u D (x — 6ut) 

but the solution is not conveniently represented on a fixed numerical grid.) Can we 
somehow combine the methods for solving each of the two parts into an efficient 
numerical method for solving the full KdV initial value problem? 

In fact we can, and indeed there is a very general technique that applies to such 
situations. In the pure mathematics community it is usually referred to as the 
Trotter Product Formula, while in the applied mathematics and numerical analysis 
communities it is called split-stepping. Let me state it in the context of ordinary 
differential equations. Suppose that Y and Z are two smooth vector fields on 
R™, and we know how to solve each of the differential equations dx/dt = Y{x) 
and dx/dt — Z(x), meaning that we know both of the flows <j> t and ipt on R" 
generated by X and Y respectively. The Trotter Product Formula is a method 
for constructing the flow 8 t generated by Y + Z out of <p and ip; namely, letting 
At = ^, 9t = lim„_ >00 (<^AtV'At) n - The intuition behind the formula is simple. 
Think of approximating the solution of dx/dt = Y(x) + Z(x) by Euler's Method. 
If we are currently at a point po, to propagate one more time step At we go to the 
point po + At(Y(po) + Z(p )). Using the split-step approach on the other hand, 
we first take an Euler step in the Y(po) direction, going to p\ = po + AtY(po), 
then take a second Euler step, but now from p\ and in the Z(pi) direction, going 
to p2 = pi + AtZ(p\). If Y and Z are constant vector fields then this gives exactly 
the same final result as the simple full Euler step with Y + Z, while for continuous 
Y and Z and small time step At it is a good enough approximation that the above 
limit is valid. 

The situation is more delicate for flows on infinite dimensional manifolds, nev- 
ertheless it was shown by F. Tappert in [Ta] that the the Cauchy Problem for 
KdV can be solved numerically by using split-stepping to combine solution meth- 
ods for Ut = —6uu x and u t = —u xxx . In addition to providing a perspective 
on an evolution equation's relation to its component parts, split-stepping allows 
one to modify a code from solving KdV to the Kuramoto-Sivashinsky equation 
(u t + uu x = —u xx — u xxxx ), or study the joint zero-diffusion-dispersion limits KdV- 
Burgers' equation (u t + 6uu x = vu xx + eu xxxx ), by merely changing one line of code 
in the Fourier module. 
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Tappert uses an interesting variant, known as Strang splitting, which was first 
suggested in [St] to solve multi-dimensional hyperbolic problems by split-stepping 
one-dimensional problems. The advantage of splitting comes from the greatly re- 
duced effort required to solve the smaller bandwidth linear systems which arise when 
implicit schemes are necessary to maintain stability. In addition, Strang demon- 
strated that second-order accuracy of the component methods need not be compro- 
mised by the assymmetry of the splitting, as long as the pattern (j>^ttp&tipAt(j>&t is 
used, to account for possible non-commutativity of Y and Z . (This may be seen by 
multiplying the respective exponential series.) No higher order analogue of Strang 
splitting is available. Serendipitously, when output is not required, several steps 
of Strang splitting require only marginal additional effort: (</>a« ipAttl>&t(f>&t )™ = 

((f) At IpAt (<f>At IpAt )™~ 1 4> Ai 

7. A Symplectic Structure for KdV 

The FPU Lattice is a classical finite dimensional mechanical system, and as such 
it has a natural Hamiltonian formulation. However its relation to KdV is rather 
complex — and KdV is a PDE rather than a finite dimensional system of ODE — so 
it is not clear that it too can be viewed as a Hamiltonian system. We shall now 
see how this can be done in a simple and natural way. Moreover, when interpreted 
as the infinite dimensional analogue of a Hamiltonian system, KdV turns out to 
have a key property one would expect from any generalization to infinite dimen- 
sions of the concept of complete integrability in the Liouville sense; namely the 
existence of infinitely many functionally independent constants of the motion that 
are in involution. (Later, in discussing the inverse scattering method, we will indi- 
cate how complete integrability was proved in a more precise sense by Fadeev and 
Zakharov[ZF]; they demonstrated that the "scattering data" for the KdV equation 
obey the characteristic Poisson bracket relations for the action-angle variables of a 
completely integrable system.) 

In 1971, Gardiner and Zakharov independently showed how to interpret KdV 
as a Hamiltonian system, starting from a Poisson bracket approach, and from this 
beginning Poisson brackets have played a significantly more important role in the 
infinite dimensional theory of Hamiltonian systems than they did in the more clas- 
sical finite dimensional theory, and in recent years this has led to a whole theory of 
so-called Poisson manifolds and Poisson Lie groups. However, we will start with the 
more classical approach to Hamiltonian systems, defining a symplectic structure for 
KdV first and then obtain the Poisson bracket structure as a derived concept (cf. 
Abraham and Marsden [AbM]). Thus, we will first exhibit a symplectic structure fl 
for the phase space P of the KdV equation and a Hamiltonian function, H : P — > R, 
such that the KdV equation takes the form u = (W s H) u . 

For simplicity, we shall take as our phase space P the Schwartz space, S(R), of 
rapidly decreasing functions u : R — > R, although a much larger space would be 
possible. (In [BS] it is proved that KdV defines a global flow on the Sobolev space 
_ff 4 (R) of functions u : R — > R with derivatives of order up to 4 in L 2 , and it is 
not hard to see that P is an invariant subspace of this flow. See also [Kal], [Ka2].) 
For u,v in P we will denote their L 2 inner product u(x)v(x) dx by (u, v) and 
we define 



where Ju(x) = u(y) dy denotes the indefinite integral of u. (For the periodic 
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KdV equation we take P to be all smooth periodic functions of period 2ir and 
replace the by '.) 

We denote by d the derivative operator, u t— » u', so dfu = u, and <9u = 
for functions u that vanish at infinity. We will also write for d k u, but for small 
k we shall also use u — u (a) , u 2 = u (1) , = v^ 2) , etc. 

There is a simple but important relation connecting f2, d, and the L 2 inner 
product; namely: 

ri(9u, v) — (u, v) . 

This is an immediate consequence of three obvious identities: d(ufv) — (du) Jv+uv, 
J^diufv) = 0, and Q(du,v) = (1/2) ^Jvu - (du)Jv). 

One important consequence of this is the weak non-degeneracy of Q. For, if i v Q 
is zero, then in particular (u,v) — fl(du,v) = —Q(v,du) = —(i v Q)(du) — for all 
u, so v = 0. 

fl is clearly a skew-bilinear form on P. Since P is a vector space, we can as usual 
identify P with its tangent space at every point, and then f2 becomes a "constant" 
2-form on P. Since it is constant, of course dfl = 0. (Below we will exhibit an 
explicit 1-form to on P such that dio = Q.) Thus f2 is a symplectic form for P, and 
henceforth we will consider P to be a symplectic manifold. 

A second consequence of fl(du, v) — (u,v) is that if F : P — > R is a smooth 
function (or "functional") on P that has a gradient VF with respect to the flat 
Riemannian structure on P defined by the L 2 inner product, then the symplectic 
gradient of F also exists and is given by (V s F) u — d((WF) u ). Recall that dF, the 
differential of F, is the 1-form on P defined by 



F(u + ev), 

=0 



and the gradient of F is the vector field dual to dF with respect to the L 2 inner 
product (if such a vector field indeed exists), i.e., it is characterized by (dF) u (v) = 
((VF) u ,v). Since ((VF) u ,v) = Q((d(WF) u ),v), it then follows that (V s F) u also 
exists and equals d((VF) u ). 

We shall only consider functions F : P — > R of the type normally considered in 
the Calculus of Variations, i.e., of the form: 



/oo 
F(u,u x ,u xx , ...)dx, 
-CO 



where F : R fc+1 -» R is a polynomial function without a constant term. Then the 
usual integration by parts argument of the Calculus of Variations shows that such 
an F has a gradient, given by: 



Remark. The above formula is written using the standard but somewhat illog- 
ical conventions of the Calculus of Variations and needs a little interpretation. 
F is a function of variables y — (yo, yi, t/2, • • • Vk), an d for example dF/du xx 
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really means the function on R whose value at x is dF/dy 2 evaluated at y = 

( U (0) Hi) U (2) ' ' ' U (U) ( X ) )- 

From what we saw above, the symplectic gradient of such an F exists and is given 



Thus every such F is a Hamiltonian function on P, defining the Hamiltonian flow 
ii = (V s F) u , where u(t) denotes a smooth curve in P. If instead of u(t)(x) we write 
u(x,t), this symbolic ODE in the manifold P becomes the PDE: 



dF 

\ du ) \ du x ) ' " V du xx 

and in particular if we take F(u, u x ) = —u 3 + u x /2 , then we get the KdV equation 
in standard form: w t = d{— 3m 2 ) — d 2 (u x ) = —6uu x — u xxx . 

Remark. The formula defining Vi can be motivated as follows. Define linear 
functional p x and q x on P by q x (u) — u(x) and p x (u) = Ju(x). (Think of these as 
providing "continuous coordinates" for P.) These give rise to differential 1-forms 
dp x and dq x on P. Of course, since p x and q x are linear, at every point u of P, we 
have dp x = p x and dq x = q x . Then f2 can now be written in the suggestive form 
ft = J2 X dp x A dq x , where ^2 X is shorthand for . This suggests that we define 
a 1-form uo on P by cj — ^2 x p x dq x , i.e., u) w (u) — Jw(x)u(x) dx. Consider this 
as a function f(w) on P and let us compute its directional derivative at w in the 
directions, (vf)(w) = ■^\ €=0 f(w+ev). We clearly get v(u)(u)) = fv(x)u(x) dx. 
Since u and v are constant vector fields, their bracket [u, v] is zero, and we calculate 
du>(u, v) = v(uj(u)) — u(u>(v)) = Ct(u, v), as expected. 

We now again specialize to the phase space P for the KdV equation, namely the 
Schwartz space <S(R) with its L 2 inner product (u,v) and symplectic form v), 
related by Cl(du, v) — (u, v). Then, since V s F = d(VF), we obtain the formula 

{F 1 , F 2 } = n(W s F 2 , V s F 1 ) = 0(5 VF 2 , 8 VFi) = (VF 2 , 0(WF 1 )) 

for Poisson brackets in terms of the Riemannian structure for P, and in particular 
we see that Fi and F 2 are in involution if and only if the two vector fields Viq and 
d\7F 2 on P are everywhere orthogonal. 

4. The Inverse Scattering Method 

In 1967, in what would prove to be one of the most cited mathematical papers 
in history, [GGKM], Clifford Gardner, John Greene, Martin Kruskal, and Robert 
Miura introduced an ingenious method, called the Inverse Scattering Transform 
(1ST), for solving the KdV equation. In the years that followed, the 1ST changed 
applied mathematics like no other tool since the Fourier Transform (to which it is 
closely related) and it soon became clear that it was the key to understanding the 
remarkable properties of soliton equations. 

Before starting to explain the 1ST, we recall the basic philosophy of using 
"transforms" to solve ODE. Suppose we are interested in some evolution equa- 
tion x = X(x) on a smooth manifold M. That is, X is a smooth vector field on M 
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that generates a flow <j> t on M. Usually our goal is to understand the dynamical 
properties of this flow — and perhaps get an explicit "formula" for <j>t{x), at least 
for some initial conditions x. A transform is a diffeomorphism T of M onto some 
other manifold N, mapping the vector X onto a vector field Y = DT(X) on N. If 
ipt is the flow generated by Y, then clearly T{<j> t {x)) — ip t (Tx), and it follows that 
if we understand ip t well, and moreover have explicit methods for computing T(x) 
and T^ 1 (y), then we in effect also know all about <fi t - 

It is important to realize that there is usually more at stake than just finding 
particular solutions of the original initial value problem. Essential structural fea- 
tures of the flow that are hidden from view in the original form of the evolution 
equation may become manifest when viewed in the transform space N. 

For example, consider the case of a linear evolution equation x = X(x) on 
some vector space M. We can formally "solve" such an equation in the form 
x(t) = exp(tX)x(Q). However, explicit evaluation of the linear operator exp(tX) is 
not generally feasible, nor does the formula provide much insight into the structure 
of the flow. But suppose we can find a linear diffeomorphism T : M — > N so 
that the linear operator Y = TXT^ 1 is diagonal in some "basis" (discrete or 
continuous) {w a } for N, say Yw a = X a w a . Then exp(tY)w a = e Xat w a , hence if 
2/(0) = E a Va w a then the solution to the initial value problem y = Y(y) with initial 
value ?/(0) is y(t) = X) Q ( eAat ya) w a- Not only do we have an explicit formula for 
ipt, but we see the important structural fact that the flow is just a direct sum (or 
integral) of uncoupled one-dimensional flows, something not obvious when viewing 
the original flow. 

This is precisely why the Fourier transform is such a powerful tool for analyz- 
ing constant coefficient linear PDE — it simultaneously diagonalizcs all such oper- 
ators! Since the Fourier transform is an excellent model for understanding the 
more complex 1ST, let us quickly review it in our current context. It will be 
convenient to complexify P temporarily, i.e., regard our phase space as the com- 
plex vector space of complex- valued Schwartz functions on R. Then the Fourier 
Transform, v ^ w = T{v), is a linear diffeomorphism of P with itself, defined by 
w(a) — -y= v(x)e~ lax dx, and the Inverse Fourier Transform, ibi-hi = TF(w) 

is given by v(x) — —j= f°° w(a)e tax da. 

Given any n + 1-tuple of real numbers a — (ao, . . . a n ), we let F a (y) denote the 
polynomial a^y + aiy 3 + . . . + a n y 2n+1 , and F a (d) the constant coefficient linear 
differential operator a d + aid 3 + . . . + a n d 2n+1 . Note that F a (d) is a vector 
field on P. In fact, if we put H a (v (0) , . . . , u (n) ) = ^Ej=o a i('{j)) 2 ' anc ^ define the 
corresponding functional H a {v) = H a (v (Q) , . . . ,v. n) )dx, then clearly F a (d) = 
V s H a . It is trivial that if b = (b , ■ ■ ■ b m ) is some other m + 1-tuple of real numbers 
then [F a (d) , Ft,(d)] — 0, i.e., all these differential operators (or vector fields) on 
P commute, and it is easy to check directly that {H a ,Hb} — 0, i.e., that the 
corresponding Hamiltonian functions Poisson commute. 

The transform, G ai of the vector field F a (d) under the Fourier Transform is easy 
to compute: G a (w)(a) = F a (ia)w(a), or in words, the partial differential operator 
F a {d) is transformed by T into multiplication by the function F a (ia). In "physicist 
language", this shows that the G a are all diagonal in the continuous basis for P 
given by the evaluations w w(a). 

Before going on to consider the Scattering Transform we should mention another 
classical and elementary transform one linearizing Burgers' Equation, v t — v xx — 
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2vv x . The transform, CH mapping v to w, is w = exp(-Jv), and the inverse 
transform 1CH that recovers v from wisti = — d\og(w) = —dw/w. Clearly w must 
be positive for this to be defined, and it is easily checked that if w is a positive 
solution of the linear heat conduction (or diffusion) equation wt = w xx then v 
satisfies Burgers' Equation. So if we start with any positive integrable function 
w(x,0), we can use the Fourier Transform method to find w(x,t) satisfying the 
heat equation, and then v(x,t) = —w x (x,t)/w(x,t) will give a solution of Burgers' 
Equation. (CH is usually referred to as the Cole-Hopf Transform, but the fact that 
it linearizes Burgers' Equation was actually pointed out by Forsyth in 1906, four 
decades before Cole and Hopf each independently rediscovered it.) 

1. Lax Equations: KdV as an Isospectral Flow 

In discussing the Inverse Scattering Transform it will be useful to have avail- 
able an interesting reinterpretation of the KdV equation as formulated by Pe- 
ter Lax. Namely, if u(x, t) is a solution of the KdV equation, and we consider 
the one-parameter family L(t) of self-adjoint operators on L 2 (JH) that are given 
by the Schrodinger operators with potentials u(t)(x) — u(x,t) (i.e., L(t)\p(x) — 
— -^ipix) + u(x,t)ip(x)), then these operators are isospectral, and in fact unitar- 
ily equivalent. That is, there is a smooth one parameter family U(t) of unitary 
operators on L 2 (R) such that U(0) = I and L(t) = U (t)L(Q)U (t)' 1 . 

By the way, in the following it will be convenient to take KdV in the form 
u t - 6uu x + u xxx = 0. 

Suppose we have a smooth one-parameter family U (t) of unitary transformations 
of a Hilbert space H with U(0) = I. U t (t), the derivative of U(t), is a tangent vector 
a,tU(t) of the group U(H) of unitary transformations of H, so B(t) — J7t(i)C/(i) _1 = 
U t (t)U(t)* is a tangent vector to U(H) at the identity, I. Differentiating UU* = I 
gives U t U* + UU t * = 0, and since U t - BU and U t * = U*B*, = BUU* + UU*B*, 
so B* = —B, i.e., B{t) is a family of skew-adjoint operators on H. Conversely, 
a smooth map t ^> B(t) of R into the skew-adjoint operators defines a time- 
dependent right invariant vector field Xu(t) = B(t)U on U(H) and so (at least in 
finite dimensions) a smooth curve U(t) of unitary operators starting from I such 
that U t (t) = B(t)U(t). 

Now suppose that L(0) is a self-adjoint operator on H, and define a family of 
conjugate operators L{t) by L{t) = U{t)L(Q)U(i)- 1 , so L(0) = U(t)*L(t)U(t). 
Differentiating the latter with respect to t, = U t *LU + U*L t U + U*LU t = 
U*(—BL + L t + LB)U. Hence, writing [B, L] = BL — LB as usual for the commu- 
tator of B and L, we see that L(t) satisfies the so-called Lax Equation, L t — [B, L]. 

Given a smooth family of skew-adjoint operators B(t), the Lax Equation is a 
time-dependent linear ODE in the vector space S of self-adjoint operators on H, 
whose special form expresses the fact that the evolution is by unitary conjugation. 
Indeed, since the commutator of a skew-adjoint operator and a self-adjoint oper- 
ator is again self-adjoint, B(t) defines a time-dependent vector field, Y, on S by 
Y(t)(L) = [B(t), L]. Clearly a smooth curve L(t) in S satisfies the Lax Equation if 
and only it is a solution curve of Y . By uniqueness of solutions of linear ODE, the 
solution L(t) of this ODE with initial condition L(0) must be the one-parameter 
family U ~(t)L(0)U (t)^ 1 constructed above. 

Given any V(0) in H, define ij}{t) = U(t)i>(0). Since U(t)L(0) = L(t)U(t), it 
follows that if ip(0) is an eigenvector of L(0) belonging to the eigenvalue A, then 
ip(t) is an eigenvalue of L(t) belonging to the same eigenvalue A. Differentiating 
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the relation denning %p{t) gives ipt — Bip(t), so we may consider ip(t) to be denned 
as the solution of this linear ODE with initial value ip(0). Since this is one of the 
main ways in which we will use Lax Equations, we will restate it as what we shall 
call the: 

Isospectral Principle. Let L(t) and B{t) be smooth one-parameter families of 
self-adjoint and skew-adjoint operators respectively on a Hilbert space H , satisfying 
the Lax Equation L t — [B,L], and let ip(t) be a curve in H that is a solution of the 
time- dependent linear ODE ipt — Bip. If the initial value, ?p(0), is an eigenvector 
of L(0) belonging to an eigenvalue X, then ip(t) is an eigenvector of L(t) belonging 
to the same eigenvalue X. 

Remark. There is a more general (but less precise) version of the Isospectral 
Principle that follows by an almost identical argument. Let V be any topological 
vector space and B(t) a family of linear operators on V such that the evolution 
equation U t — BU is well-defined. This means that for each if>(0) in V there should 
exist a unique solution to the time-dependent linear ODE = B(t)^p{t). The 

evolution operator U(t) is of course then defined by U(t)tp(0) = ip(t), so Ut = BU. 
Then clearly the conclusion of the Isospectral Principle still holds. That is to say, if 
a smooth family of linear operators L(t) on V satisfies the Lax Equation L t = [B,L], 
then U(t)L(0) = L(t)U(t), so if L(0)^(0) = A^(0) then L(t)ijj(t) = Xip(t). 

We now apply the above with H = L 2 (R). We will see that if u satisfies KdV, 
then the family of Schrodinger operators L(t) on H defined above satisfies the Lax 
Equation L t = [B,L], where 

B(t)ip(x) = -4ip xxx (x) + 3 (u(x, t)ij) x {x) + (u(x, t)i>(x)) x ) , 

or more succinctly, B = —Ad 3 + 3(ud + du). Here and in the sequel it is convenient 
to use the same symbol both for an element w of the Schwartz space, <S(R), and for 
the bounded self-adjoint multiplication operator v i— > wv on H . Since H is infinite 
dimensional and our operators B and L are unbounded on H, some care is needed 
for a rigorous treatment. But this is relatively easy. Note that all the operators 
involved have the Schwartz space as a common dense domain, so we can use the 
preceding remark taking V = <S(R) (we omit details). 

Note that since d is skew-adjoint, so is any odd power, and in particular 4<9 3 
is skew-adjoint. Also, the multiplication operator u is self-adjoint, while the anti- 
commutator of a self-adjoint and a skew-adjoint operator is skew-adjoint, so ud+du 
and hence B is indeed skew-adjoint. 

Since clearly L t = Ut, while Ut — 6uu x + u xxx = by assumption, to prove that 
L t = [B,L] we must check that [B,L] — 6uu x — u xxx . Now [B,L] = 4[<9 3 ,<9 2 ] — 
4[<9 3 , u] — 3[ud, d 2 ] + 3[ud, u] — 3[du, d 2 } + 3[du, u], and it easy to compute the six 
commutators relations [<9 3 ,9 2 ] = 0, [<9 3 ,u] = u xxx + 3u xx d + 3u x d 2 , [ud, d 2 ] = 
-u xx d - 2u x d 2 , [ud,u] = uu x , [du, d 2 } = -3u xx d - 2u x d 2 - u xxx , and [du,u] = 
—uu Xl from which the desired expression for [B,L] is immediate. 

Let us now apply the Isospectral Principle to this example. 

KdV Isospectrality Theorem. Suppose u(x,t) is a solution of the KdV equa- 
tion, 

u t - 6uu x + u xxx = 0, 
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whose initial value u(x,0) is in the Schwartz space 5(R), and that ip(x) is an 
eigenfunction of the Schrddinger Equation with potential u(x,0) and eigenvalue A: 

d 2 

--J^2^P( X ) + u ( x i 0)lp(x) = \lp(x). 

Let ip(x,t) be the solution of the evolution equation ip t — Bip, i.e., 

ft = - A U + K u ^f x ^ + ik 

with the initial value ip(x,0) — ">p(x). Then ip(x,t) is an eigenfunction for the 
Schrddinger Equation with potential u(x, t) and the same eigenvalue X: 

-ipxx(x, t) + u(x, t)ip(x, t) = \ip(x, t), 

and moreover, if ip(x) is in L 2 , then the L 2 norm of tp(-,t) is independent oft. 
Finally, ip(x,t) also satisfies the first- order evolution equation 

ipt - (4A + 2u)tp x + u x ip = 0. 

Proof. Except for the final statement this is an immediate application of the Isospec- 
trality Principle. Differentiating the eigenvalue equation for ip(x, t) with respect to 
x gives ip xxx = u x ip + (u — X)tp x , and substituting this into the assumed evolution 
equation for ip gives the asserted first-order equation for ip. ■ 

By the way, it should be emphasized that the essential point is that when a poten- 
tial evolves via KdV then the corresponding Schrodinger operators are isospectral, 
and this is already clearly stated in [GGKM] . Lax's contribution was to explain the 
mechanism behind this remarkable fact, and to formulate it in a way that was easy 
to generalize. In fact, almost all generalizations of the phenomena first recognized 
in KdV have used the Lax Equation as a jumping off place. 

2. The Scattering Data and its Evolution 

We now fix a "potential function" u in the Schwartz space <S(R) and look more 
closely at the space E\ (u) of A cigcnfunctions of the Schrodinger operator with this 
potential. By definition, E\(u) is just the kernel of the linear operator L u (ip) — 
— +utp—\ip acting on the space C°° (R) , and by the elementary theory of second- 
order linear ODE it is, for each choice of A, a two-dimensional linear subspace of 
C°° (R) . Using the special form of L u we can describe E\ (it) more precisely. We will 
ignore the case A = 0, and consider the case of positive and negative A separately. 

Suppose A = — k 2 , k > 0. Note that any ip in E\(u) will clearly be of the 
form t/)(x) = ae KX + be~ KX in any interval on which u vanishes identically. Thus 
if u has compact support, say u(x) = for \x\ > M, then we can find a basis 
V'a -<x>> -oo f° r B\(u) such that for x < —M, ipf -oo( x ) = e±KX i or equivalently 

-oo( x ) e ~ KX — 1 an d ^a ~ooi x ) eKX = 1 f° r 1 < —M. Similarly there is a second 

basis V'aoo'V'aoo f° r E\{u) such that V'a oo( x ) e ~ KX ~ 1 an< ^ V'a oo( x ) eKX = 1 f° r 
x > M. When u does not have compact support but is only rapidly decreasing 
then it can be shown that there still exist two bases ip~l _ oc , ip^ _ oc and ip\ oo> oo 
for E\(u) such that lim x ^_ 00 ip^ _ oo (x)e^ KX — 1 and lim a ._ < _ 00 tp^ -oo( x ) eKX = L 
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while lim^oo V'a oo( x ) e ~ K;E = 1 an d lim^oo ip\ oo( x ) eKX = 1- (A more descriptive 
way of writing these limits is ip^ _ o( a; ) ~ eKa: an< ^ V'a -oo( x ) ~ e ~ KX as & ~~ * — oo, 
while V'a oo( x ) ~ ^ an( ^ ^AcoW ~ e ~ Ka: as x — * 00 •) Let us define functions 
/(A) and c(A) by ^ _ oo = f(X)tp^ x + c(A)V>a oo- Using these bases it is easy to 
detect when A is a so-called "discrete eigenvalue" of L u , i.e., when E\(u) contains 
a non-zero element ip of L 2 (R). We can assume ip has L 2 norm one, and since 
V'a -00 blows up at —00 while i/>a 00 blows up at 00, V must be both a multiple 
of tpx-00 anc ^ °f V'a 00' an d since ip ^ it follows that /(A) = 0. Conversely, if 
/(A) = then ip~\ _ 00 = c(X)ip^ ^ decays exponentially both at 00 and —00 and 
so we can normalize it to get an element of E\(u) with L 2 norm one. Thus the 
discrete eigenvalues of L u are precisely the roots of the function /. 

It follows from standard arguments of Sturm-Liouville theory that in fact L u 
has only finitely many discrete eigenvalues, Ai, . . . , Ajv, with corresponding L 2 nor- 
malized eigenfunctions tpi,... ,ipN, and these determine so-called "normalization 
constants" c\,... , cjv by ip n = c n tp^ n ^ i.e., if we write A„ = — k 2 , then c„ is 
characterized by ip n (x) ~ c n e~ KnX as x — > 00. We note that the ^ n and hence the 
normalization constants c„ are only determined up to sign, but we will only use (? n 
in the Inverse Scattering Transform. 

For A = k 2 , k > there are similar considerations. In this case if u(x) vanishes 
for I a; I > M then any element of E\(u) will be of the form ae lkx + be~ lkx for 
x < —M and also of the form ce lkx + de~ lkx for x > M. If u is only rapidly 
decaying then we can still find bases ip^ _ DO , ip^ _ OG and V'a 00 ' V'a 00 f° r E\( u ) sucn 
that ipx-ooix) ~ e lkx and ^ _ oc (x) ~ e~ lkx as x — > -00, while ip~xoo( x ) ~ e * fex 
and ipxoo( x ) ~ e _lfc:E as x — > 00. Then V'a -00 = "V'a 00 + Waco' where a can 
be shown to be non-zero. Dividing by a we get a particular eigenfunction ip^, 
called the Jost solution, with the special asymptotic behavior ipk(x) ~ a(k)e~ lkx 
as x — > —00 and ^fc(x) ~ e~ 4fc:E + b{k)e lkx as x — > 00. 

The functions a(/c) and fr(fc) are called the transmission coefficient and reflection 
coefficient respectively, and b(k) together with the above normalizing constants 
ci, . . . c„ make up the "Scattering Data" , S(u) for u. 

While it is perhaps intuitively clear that the bases ipx±<x> mus t exist, to supply 
the asymptotic arguments required for a rigorous proof of the crucial theorem on the 
time evolution of the Scattering Data it is essential to give them precise definitions, 
and we do this next. 

First consider the simpler problem of the first order ODE L u ip = ^ ~ ^ 
we make the substitution ip = e Xx (p, then the eigenvalue equation L u (ip) = Xtp 
becomes -p = ucp, so (assuming u depends on a parameter t) we have (p(x, t) = 
exp(J^ oo t) dl;). Note that lim^-oo <p{x, t) — 1 while 

lim (p(x,t) — expl / = c(t), 

so if ip(x, t) is an eigenfunction of L u , ip(x, t) ~ c(t)e Xx (i.e., lim^oo ip(x, t)e~ Xx — 
c(t)) and since u(x,t) is rapidly decaying we can moreover differentiate under the 
integral sign to obtain ip t (x,t) ~ c'(t)e Xx . One can not differentiate asymptotic 
relations in general of course, and since we will need a similar relation for eigen- 
functions of Schrodinger operators we must make a short detour to justify it by an 
argument similar to the above. 
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If we now make the substitution ip = <pe~ KX in the eigenvalue equation ip xx = 
n 2 tp + uip, then we get after simplifications <fi x x — 2n<p x = ucp, or d(d — 2n)(p — 
ucf>. Recall the method of solving the inhomogeneous equation d(d — 2n)<p = f 
by "variation of parameters" . Since 1 and e 2KX form a basis for the solutions of 
the homogeneous equation, we look for a solution of the form <p = 61 + Qie 2KX , 
and to make the system determined we add the relation Q[ + Q' 2 e 2KX = 0. This 
leads to the equations 9^ = — ^ and 62 = j^e 2KX so = — Jq f(Q d£ + 
Jo f(0 e ~ 2KX d£- If we now take / = u<p (and use <pe~ KX — ip) then we get the 
relation <j>{x,t) = ± f\(£,t)<p(£,t) d£ - f\(£,t)ip(£,t)e- KX d£. Assuming 
that — k 2 is a discrete eigenvalue, and that tp has L 2 norm 1, uip will also be in 
L 2 and we can estimate the second integral using the Schwartz Inequality, and we 
see that in fact | J u{£,)ip(^)e~ KX d£\ < 0(e~ KX ), so the second term is 0(e KX ). 
It follows that ip(x,t) <~ c{t)e KX in the sense that lim^-oo ip{x, t)e~ KX = c(t), 
where c(t) = <p{— 00, t) = j- J° u(£, t)<p(£, t) d£. In other words, the normalizing 
constant is well defined. But what is more important, it also follows that if u(x,t) 
satisfies KdV, then the normalizing constant c(t) for a fixed eigenvalue — k 2 is a 
differentiable function of t and satisfies ip t (x,t) ~ c'{t)e KX . This follows from the 
fact that we can differentiate the formula for c(t) under the integral sign because 
u is rapidly decreasing. Note that differentiating the relation ipe KX = (p gives 
ip x e KX = <p x — Kip. But the formula for <p shows that (p x converges to zero at —00, 
so ip x (x,t) ~ —Kc(t)e KX . From the KdV Isospectrality Theorem, we know that if 
u(x,t) satisfies KdV, then ip(x,t) satisfies ip t — {—An 2 + 2u)ip x + u x ip = 0, so the 
left hand side times e KX converges to c'(t) + An 2 {—nc{t)) as x — > 00 and hence 
c'{t) - 4n 3 c(t) = 0, so c(t) = c(0)e 4K3 *. 

By a parallel argument (which we omit) it follows that the transmission and 
reflection coefficients are also well defined and that the Jost solution ipk{x, t) satisfies 
{ipk)t ~ a t (k,t)e~ lkx at —00 and (ipk)t ~ bt(k,t)e lkx at 00, and then one can 
show from the KdV Isospectrality Theorem that the transmission coefficients are 
constant, while the reflection coefficients satisfy b(k,t) = b(k,0)e 8lk 

Theorem on Evolution of the Scattering Data. Let u(t) — u(x,t) be a 
smooth curve in <S(R) satisfying the KdV equation u t — 6uu x + u xxx — and 
assume that the Schrodinger operator with potential u(t) has discrete eigenvalues 
— k 2 , . . . ,—n 2 N whose corresponding normalized eigenf unctions have normalization 
constants ci(t), . . . ,c n (t). Let the transmission and reflection coefficients ofu(t) be 
respectively a(k,t) and b(k,t). Then the transmission coefficients are all constants 
of the motion, i.e., a(k,t) = a(k,0), while the Scattering Data, c n (t) and b(k,t), 
satisfy: 

1) c n (t) = c„(0)e 4K « t ; 

2) b(k,t) = b(k,0)e Slk3t . 

We note a striking (and important) fact: not only do we now have an explicit and 
simple formula for the evolution of the scattering data S(u(t)) when u(t) evolves by 
the KdV equation, but further this formula does not require any knowledge 
of u{t). 

The fact that the transmission coefficients a(k) are constants of the motion while 
the logarithms of the reflection coefficients, b(k) vary linearly with time suggest 
that perhaps they can somehow be regarded as action-angle variables for the KdV 
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equation, thereby identifying KdV as a completely integrable system in a precise 
sense. While a(k) and b(k) are not themselves canonical variables, Zakharov and 
Fadeev in [ZF] showed that certain functions of a and b did satisfy the Poisson 
commutation relations for action-angle variables. Namely, the functions p(k) — 
(fc/7r)log|a(fc)| 2 = (jfe/7r) log[l + |6(fc)| 2 ] and q(k) = arg(6(fc)) satisfy {p(k),q(k')} = 
5(k - k') and {p(k),p(k')} = {q(k), q(k')} = 0. 

The above formula for the evolution of the Scattering Data is one of the key 
ingredients for The Inverse Scattering Method, and we are finally in a position to 
describe this elegant algorithm for solving the Cauchy problem for KdV. 

The Inverse Scattering Method 

To solve the KdV initial value problem u t — 6uu x + u xxx — with given initial 
potential u(x,0) in <S(R): 

1) Apply the "Direct Scattering Transform", i.e., find the discrete eigen- 
values — k 2 , . . . , —k 2 n for the Schrodinger operator with potential u(x, 0) 
and compute the Scattering Data, i.e., the normalizing constants c„(0) 
and the reflection coefficients b(k,0). 

2) Define c n {t) = c„(0)e 4K "* and b{k,t) = b(k, 0)e 8ife3 '. 

3) Use the Inverse Scattering Transform (described below) to compute u{t) 
from c n (t) and b(k,t). 

3. The Inverse Scattering Transform 

Recovering the potential u of a Schrodinger operator L u from the Scattering Data 
S(u) was not something invented for the purpose of solving the KdV initial value 
problem. Rather, it was a question of basic importance to physicists doing Cy- 
clotron experiments, and the theory was worked out in the mid-1950's by Kay and 
Moses [KM], Gelfand and Levitan [GL], and Marchenko [M]. 

Denote the discrete eigenvalues of u by — k 2 , . . . , — k%, the normalizing constants 
by a, . . . , c^, and the reflection coefficients by b(k), and define a function 

2ir J-°° 

Inverse Scattering Theorem. The potential u can be recovered using the formula 
u(x) = —2-£;K(x,x), where K(x,z) is the unique function on R x R that is zero 
for z < x and satisfies the Gelfand-Levitan-Marchenko Integral Equation: 

/oo 
K(x, y)B(y + z) dy = 0. 
-oo 

(For a proof, see [DJ], Chapter 3, Section 3, or [La3], Chapter II.) 

We will demonstrate by example how the Inverse Scattering Method can now be 
applied to get explicit solutions of KdV. But first a couple of general remarks about 
solving the Gelfand-Levitan-Marchenko equation. We assume in the following that 
B is rapidly decreasing. 

Let C(R x R) denote the Banach space of bounded, continuous real- valued 
functions on R x R with the sup norm. Define T B : C(R x R) — > C(R x R) by 
the formula 



/oo 
K(x,y)B(y + z)dy. 
-OO 
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Then K satisfies the Gelfand-Levitan-Marchenko equation if and only if it is a fixed- 
point of T B . It is clear that T B is Lipschitz with constant ||£?|| L i, so if ||-B|| L i < 1 
then by the Banach Contraction Principle the Gelfand-Levitan-Marchenko equation 
has a unique solution, and it is the limit of the sequence K n defined by K\{x, z) — 
-B(x + z), K n+1 =T B (K n ). 

Secondly, we note that if the function B is "separable" in the sense that it 
satisfies an identity of the form B(x + z) = J2n=i x n{x)Z n (z), then the Gelfand- 
Levitan-Marchenko equation takes the form 

N N oo 

K{x,z) + Y,Xn(x)Z n (z) + Y,Zn(z) / K(x, y)X n (y) dy = 0. 

n=l n=l Jx 

It follows that K(x, z) must have the form K(x, z) = L n (x)Z n (z). If we sub- 

stitute this for K in the previous equation and define a nm (x) = Z m (y)X n (y) dy 
then we have reduced the problem to solving N linear equations for the unknown 
functions L n , namely: L n (x) + X n (x) + Ylm=i a nm{x)L m (x) = 0, or X n (x) + 
ELi A nm (x)L m (x) = 0, where A nm (x) = S nm + a nm (x). Thus finally we have 

N N 

K(x,x) =-J2z n (x)J2 A^(x)X m (x). 

n—l m—1 

4. An Explicit Formula for KdV Multi-Solitons 

A potential u is called "reflectionless" if all the reflection coefficients are zero. 
Because of the relation b(k, t) = b(k, 0)e 8tk *, it follows that if u(x, t) evolves by KdV 
and if it is reflectionless at t = then it is reflectionless for all t. If the discrete 
eigenvalues of such a potential are —k\,... , — n 2 N and the normalizing constants 
are Ci,... ,c N , then B(£) = Eti^"^, so B(x + z) = J2n=i X n(x)Z n (z), 
where X n (x) = c„e~ K ™ x , and Z n (z) = e~ KnZ and we are in the separable case just 
considered. Recall that a nm {x) = Z m (y)X n (y) dy = <? n e~^ Kn+Km ^ y dy = 

C 2 e -( E „+ Km )i + and tllat 

^nm (^) — ^nm &nm{x) — ^nm ~l~ C n 6 ^ ^ /(^n ~l~ ^m) - 

Differentiation gives -^A nm (x) = — c^e~^ Kn+Km ^ x , so by a formula above 

N N 

K(X,X) =-J2 Z n{x)Yl A n m ( X ) X m{x) 
n—l m—1 
N N 



n=l 
N N 



n—l m—1 

= tr (a- 1 (x)^-A(x) 
\ dx 

1 d , ./ x 
■ det A(x) 



det(A(x)) dx 

4- log det A(x). 
dx 
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and so u(x) = -2-£K(x,x) = -2-^ logdet A{x). 

If TV = 1 and we put n = K\ it is easy to see that this formula reduces to 
our earlier formula for traveling wave solutions of the KdV equation: u(x, t) = 
— -tj- sech 2 (ft(x — n 2 t)). We can also use it to find explicit solutions u(x,t) for 
N = 2. Let gi(x,t) = exp(n 3 t - mx), and set A = , then 

/ ,n _ n K i9i + K %92 + 2(ki - K 2 ) 2 .gi.g 2 + Agiff 2 («ig2 + Kgffi) 

For general TV the solutions u(x, t) that we get this way are referred to as the pure 
TV-soliton solutions of the KdV equation. It is not hard to show by an asymptotic 
analysis that for large negative and positive times they behave as a superposition of 
the above traveling wave solutions, and that after the larger, faster moving waves 
have all passed through the slower moving shorter ones and they have become well- 
separated, the only trace of their interactions are certain predictable "phase-shifts" , 
i.e., certain constant translations of the locations of their maxima from where they 
would have been had they not interacted. (For details see [L], p. 123.) 

5. The KdV Hierarchy 

By oversimplifying a bit, one can give a succinct statement of what makes the KdV 
equation, u t — 6uu x + u xxx , more than just a run-of-the-mill evolution equation; 
namely it is equivalent to a Lax equation, = [B, L u ], expressing that the corre- 

,2 

sponding Schrodinger operator L u = — + u is evolving by unitary equivalence 
so that the spectral data for L u provides many constants of the motion for KdV, 
and in fact enough commuting constants of the motion to make KdV completely 
integrable. 

It is natural to ask whether KdV is unique in that respect, and the answer is 
a resounding "No!". In his paper introducing the Lax Equation formulation of 
KdV, [Lai], Peter Lax already pointed out an important generalization. Recall 
that B = — 4<9 3 + 3(ud + du). Lax suggested that for each integer j one should 
look for an operator of the form B 3 = ad 2j+1 + J2l=i(^ 2t l + d 2t ~ 1 bi), where the 
operators hi are to be chosen so as to make the commutator [Bj , L u ] a zero order 
operator — that is [Bj,L u ] should be multiplication by some polynomial, Kj(u), in 
u and its derivatives. This requirement imposes j conditions on the j coefficients 
bi, and these conditions uniquely determine the bi as multiplications by certain 
polynomials in u and its derivatives. For example, Bq = d, and the corresponding 
Lax Equation u t = K (u) is u t = u x , the so-called Linear Advection Equation. 
And of course Bi is just our friend — 49 3 + 3(ud + du), whose corresponding Lax 
Equation is KdV. 

Kj(u) is a polynomial in the derivatives of u up through order 2j + 1, and 
the evolution equation ut = Kj(u) is referred to as the j-th higher order KdV 
equation. This whole sequence of flows is known as "The KdV Hierarchy" , and the 
initial value problem for each of these equations can be solved using the Inverse 
Scattering Method in a straightforward generalization from the KdV case. But 
even more remarkably: 

Theorem. Each of the higher order KdV equations defines a Hamiltonian flow on 
P. That is, for each positive integer j there is a Hamiltonian function Fj : P — > R 
(defined by a polynomial differential operator of order j, F(u^, . . . , wq))) such that 
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Kj{u) — (V s Fj) u . Moreover, all the functions Fj are in involution, so that all the 
higher order KdV flows commute with each other. 

The proof can be found in [La3], Chapter I. 

It should be pointed out here that the discovery of the constants of the motion 
Fk goes back to the earliest work on KdV as an integrable system. In fact, it came 
out of the research in 1966 by Gardner, Greene, Kruskal, and Miura leading up 
to their paper [GGKM] in which the Inverse Scattering Method was introduced. 
However, the symplectic structure for the phase space of KdV, and the fact that 
these functions were in involution was only discovered considerably later, in 1971 
[G],[ZF]. 

To the best of my knowledge, the higher order KdV equations are not of inde- 
pendent interest. Nevertheless, the above theorem suggests a subtle but important 
change in viewpoint towards the KdV equation — one that proved important in fur- 
ther generalizing the Inverse Scattering Method to cover other evolution equations 
which arc of interest for their own sake. Namely, the key player in the Inverse 
Scattering Method should not be seen as the KdV equation itself, but rather the 
Schrodinger operator L u . If we want to generalize the Inverse Scattering Method, 
we should first find other operators L with a "good scattering theory" and then look 
among the Lax Equations L t = [M, L] to find interesting candidates for integrable 
systems that can be solved using scattering methods. 

In fact, this approach has proved important in investigating both finite and 
infinite dimensional Hamiltonian systems, and in the remainder of this article we 
will investigate in detail one such scheme that has not only been arguably the most 
sucessful in identifying and solving important evolution equations, but has moreover 
a particularly elegant and powerful mathematical framework that underlies it. This 
scheme was first introduced by Zakharov and Shabat [ZS] to study an important 
special equation (the so-called Nonlinear Schrodinger Equation, or NLS). Soon 
thereafter, Ablowitz, Kaup, Newell, and Segur [AKNS] showed that one relatively 
minor modification of the Zakharov and Shabat approach recovers the theory of the 
KdV equation, while another leads to an Inverse Scattering Theory analysis for a 
third very important evolution equation, the Sine-Gordon Equation (SGE). AKNS 
went on to develop the Zakharov and Shabat technique into a general method for 
PDE with values in 2 x 2-matrix groups, and ZS further generalized it to the case 
of n x n-matrix groups. Following current custom, we will refer to this method as 
the ZS-AKNS Scheme. 

5. The ZS-AKNS Scheme 
1. Flat Connections and the Lax Equation, ZCC 

To prepare for the introduction of the ZS-AKNS Scheme, we must first develop 
some of the infra-structure on which it is based. This leads quickly to the central 
Lax Equation of the theory, the so-called "Zero-Curvature Condition", (or ZCC). 

First we fix a matrix Lie Group G and denote its Lie algebra by Q. That is, G 
is some closed subgroup of the group GL(n, C) of all n x n complex matrices, and 
Q is the set of all n x n complex matrices, X, such that exp(V) is in G. If you 
feel more comfortable working with a concrete example, think of G as the group 
SL(n, C) of all n x n complex matrices of determinant 1, and Q as its Lie algebra 
sl(n, C) of all n x n complex matrices of trace zero. In fact, for the original ZS- 
AKNS Scheme, G = SL(2, C) and Q = sl(2, C), and we will carry out most of the 



42 



RICHARD S. PALAIS 



later discussion with these choices, but for what we will do next the precise nature 
of G is irrelevant. 

Let Vbe a flat connection for the trivial principal bundle R 2 x G. Then we can 
write V = d — oj, where a; is a 1-form on R 2 with values in the Lie algebra Q. Using 
coordinates (x, t) for R 2 we can then write ui = Adx + B dt where A and B are 
smooth maps of R 2 into Q. 

If X is a vector field on R 2 , then the covariant derivative operator in the direction 
X is = dx — u(X), and in particular, the covariant derivatives in the coordinate 
directions £ and f are = £ - A and = §- t - B. 

Since we are assuming that Vis flat, it determines a global parallelism. If (x ,t ) 
is any point of R 2 then we have a map ip : R 2 — ► G, where tp(x,t) is the parallel 
translation operator from {xo,to) to (x,t). Considered as a section of our trivial 
principal bundle, tp is covariant constant, i.e., Vx i 1 = for any tangent vector 
field X. In particular, taking X to be ^ and ^ give the relations ip x = Atp and 
il> t = Bil>. 

There are many equivalent ways to express the flatness of the connection V. On 
the one hand the curvature 2-form du> — lo A u> is zero. Equivalently, the covariant 
derivative operators in the ^ and Jj directions commute, i.e., [J^ — A, — B] = 
0, or finally, equating the cross-derivatives of ip, {Aip) t = ip x t — iptx — {Bip) x - 
Expanding the latter gives A t ip + Aip t — B x %p + Btp x or A t %p + ABtp = B x tp + BAtp, 
and right multiplying by i/> _1 we arrive at the so-called "Zero-Curvature Condition" : 
At — B x — [A,B] = 0. Rewriting this as — A t = —B x + [B,—A], and noting 
that [B, ^] = — B x , we see that the Zero-Curvature Condition has an equivalent 
formulation as a Lax Equation: 

«^» (JH,= 

and it is ZCC that plays the central role in the ZS-AKNS Scheme. 

Recall what ZCC is telling us. If we look at t as a parameter, then the operator 
-tj- — A(x, to) is the covariant derivative in the x-dircction along the line t = t , and 
the Lax Equation ZCC says that as a function of to these operators are all conjugate. 
Moreover the operator ip(to,ti) implementing the conjugation between the time to 
and the time t\ satisfies ip t = Btp, which means it is parallel translation from (x, t ) 
to (x,ti) computed by going "vertically" along the curve t (x,t). But since 

— A(x, to) generates parallel translation along the horizontal curve x \— > (x, to), 
what this amounts to is the statement that parallel translating horizontally from 
(xo,to) to (xi,to) is the same as parallel translation vertically from (xo,to) to 
(xq, t\) followed by parallel translation horizontally from (xq, t\) to (xi,t\) followed 
by parallel translation vertically from (x\,ti) to (xi, to)- Thus, in the case of ZCC, 
the standard interpretation of the meaning of a Lax Equation reduces to a special 
case of the theorem that if a connection has zero curvature then the holonomy 
around a contractible path is trivial. 

2. Some ZS-AKNS Examples 

The ZS-AKNS Scheme, is a method for solving the initial value problem for certain 
(hierarchies of) evolution equations on a space of "potentials" P. In general P will 
be of the form <S(R, V), where V is some finite dimensional real or complex vector 
space, i.e., each potential u will be a map x \— > u(x) of Schwartz class from R into 
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V. (A function u with values in V is of Schwartz class if, for each linear functional 
£ on V, the scalar valued function I o u is of Schwartz class, or equivalently if, 
when we write u in terms of a fixed basis for V, its components are of Schwartz 
class.) The evolution equations in question are of the form u t — F(u) where the 
map F : P — ► P is a "polynomial differential operator" — i.e., it has the form 
F(u) — p(u, u x , u xx , . . . ), where p is a polynomial mapping of V to itself. 

When we say we want to solve the initial value (or "Cauchy" ) problem for such 
an equation, we of course mean that given u° = u(x,0) in P wc want to find 
a smooth map t u{t) = u(x,t) of R to P with u(0) = u° and u t (x,t) = 
p(u(x,t),u x (x,t),u xx (x,t), . . .). In essence, we want to think of F as a vector 
field on P and construct the flow <fi t that it generates. (Of course, if P were a finite 
dimensional manifold, then we could construct the flow <j> t by solving a system of 
ODE's, and as we shall see, the ZS-AKNS Scheme allows us in certain cases to solve 
the PDE m = p(u, u x , u xx , . . . ) by reducing it to ODE's.) 

The first and crucial step in using the ZS-AKNS Scheme to study a particular 
such evolution equation consists in setting up an interpretation of A and B so that 
the equation u t = p(u, u x , u XXl . . .) becomes a special case of ZCC. 

To accomplish this, we first identify V with a subspace of Q (so that P = <S(R, V) 
becomes a subspace of <S(R, G)), and define a map u i— » A(u) of P into C°°(R, Q) 
of the form A(u) = const + u, so that if u depends parametrically on t then 
(d--A( U )) t = -u t . 

Finally (and this is the difficult part) we must define a map u i— > B(u) of P into 
C°°(R, Q) so that [B(u), £ - A(u)} = -p{u, u x , u xx , ...). 

To interpret the latter equation correctly, and in particular to make sense out 
of the commutator bracket in a manner consistent with our earlier interpretation 
of A and B, it is important to be clear about the interpretation A(u) and B{u) 
as operators, and in particular to be precise about the space on which they are 
operating. This is just the space C°°(R, gl(2, C)) of smooth maps tp of R into 
the space of all complex 2x2 matrices. Namely, we identify A(u) with the zero- 
order differential operator mapping ij) to A(u)ip, the pointwise matrix product of 
A(u)(x) and ip(x), and similarly with B(u). (This is a complete analogy with the 
KdV situation, where in interpreting the Schrodinger operator, we identified our 
potential u with the operator of multiplication by u.) Of course (-^^(x) = ip x . 

We will now illustrate this with three examples: the KdV equation, the Nonlinear 
Schrodinger Equation (NLS), and the Sine-Gordon Equation (SGE). In each case 
V will be a one-dimensional space that is embedded in the space of off-diagonal 

complex matrices ^ ^ ^ , and in each case A(u) = a\ + u, where A is a complex 

( —i 

parameter, and a is the constant, diagonal, trace zero matrix a = I 
Example 1. [AKNS] Take u(x) = ( ^ ^ ) . and lct 



B{u)=a\* + u\-+[^ l%) X +{ 1 A 



<?x -q 



9x 

2 4 



Then an easy computation shows that ZCC is satisfied if and only if q satisfies KdV 
in the form q t = -j(6qq x + q X xx)- 
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Example 2. [ZS] Take u(x) = ( ° ? P ) > and lct 



-q(x) 
B(u) = aA 2 + uX + ( 5 



M 2 i& 



In this case ZCC is satisfied if and only if q(x, t) satisfies the so-called Nonlinear 
Schrodinger Equation (NLS) q t = j(q xx + 2\q\ 2 q). 

Example 3. [AKNS] Take u = f 2 j, and lct = ±w where 

= I ( cos «( a; ) sing(x) V ^ cage ^ 2CC is satisfied if and only if g 



4 \ v sin<7(x) — cosg(x 
satisfies the Sine-Gordon Equation (SGE) in the form q xt = sin q. 

In the following description of the ZS-AKNS Scheme, we will state definitions 
and describe constructions in a way that works for the general ZS-AKNS case — and 
we will even make occasional remarks explaining what modifications are necessary 
to extend the theory to the more general case ofnxn matrix groups. (For the 
full details of this latter generalization the reader should consult [Sa].) However, 
working out details in even full ZS-AKNS generality would involve many distracting 
detours, to discuss various special situations that are irrelevant to the main ideas. 
So, for ease and clarity of exposition, we will carry out most of the further discussion 
of the ZS-AKNS Scheme within the framework of the NLS Hierarchy. 

3. The Uses of Solitons 

There are by now dozens of "soliton equations" , but not only were the three 
examples from the preceding section the first to be discovered, they are also the 
best known, and in many ways still the most interesting and important. In fact, in 
addition to their simplicity and their Hamiltonian nature, each has certain special 
properties that give them a "universal" character, so that they are almost sure to 
arise as approximate models in any physical situation that exhibits these properties. 
In this section I will try to say a little about these special features, and also explain 
how these equations have been used in both theoretical and applied mathematics. 

We have already discussed in some detail the historical background and many 
of the interesting features and applications of the KdV equation, so here I will 
only re-iterate the basic property responsible for its frequent appearance in applied 
problems. In the KdV equation there is an extraordinary balance between the 
shock-forming tendency of its non-linear term uu x and the dispersive tendency of 
its linear term u xxx , and this balance is responsible for the existence of remark- 
ably stable configurations (solitons) that scatter clastically off one another under 
the KdV evolution. Moreover KdV is the simplest non-dissipative wave-equation 
equation with these properties. 

The Sine-Gordon equation is even older than the KdV equation; it arose first in 
the mid-nineteenth century as the master equation for "pseudo-spherical surfaces" 
(i.e., surfaces of constant negative Gaussian curvature immersed in R 3 ). Without 
going into the details (cf. [Da] and [PT], Part I, Section 3.2), the Gauss-Codazzi 
equations for such surfaces reduce to the Sine-Gordon equation, so that by the 
"Fundamental Theorem of Surface Theory", there is a bijective correspondence 
between isometry classes of isometric immersions of the hyperbolic plane into R 3 
and solutions to the Sine-Gordon equation. Because of this (and the great interest 
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in non-Euclidean geometry during the latter half of the last century) a prodigious 
amount of effort was devoted to the study of the Sine-Gordon equation by the great 
geometers of that period, resulting in a beautiful body of results, most of which can 
be found in G. Darboux' superb treatise on surface theory Legons sur la Theorie 
Generate des Surfaces [Da]. 

One of the most notable features of this theory is the concept of a "Backlund 
Transformation". Starting from any solution of the Sine-Gordon equation, this 
creates a two-parameter family of new solutions. One slight complication is that 
the construction of the new solutions requires solving a certain ordinary differential 
equation. However the so-called "Bianchi Permutability Formula" allows us to 
easily compose Backlund Transformations. That is, once we have found this first 
set of new solutions, we can apply another Backlund Transformations to any one 
of them to get still more solutions of Sine-Gordon, and this second family of new 
solutions can be written down explicitly as algebraic functions of the first set, 
without solving any more ODEs. Moreover, we can continue inductively in this 
manner, getting an infinite sequence of families of more and more complex solutions 
to the Sine-Gordon equations (and related pseudospherical surfaces). If we take 
as our starting solution the identically zero (or "vacuum") solution to the Sine- 
Gordon equation, this process can be carried out explicitly. At the first stage we get 
the so-called Kink (or one-soliton) solutions to the Sine-Gordon equation, and the 
corresponding family of pseudospherical surfaces is the Dini family (including the 
well-known pseudosphere) . Using the Bianchi Formula once gives rise to the two- 
soliton solutions of Sine-Gordon and the corresponding Kiien Surface, and repeated 
application leads in principle to all the higher soliton solutions of the Sine-Gordon 
equations (cf. [Da], [PT], loc. cit. for more details). In fact, the classical geometers 
knew so much about the "soliton sector" of solutions to Sine-Gordon that it might 
seem surprising at first that they did not go on to discover "soliton mathematics" 
a century before it actually was. But of course they knew only half the story — 
they knew nothing of the dispersive, non-soliton solutions to Sine-Gordon and had 
no imaginable way to discover the Inverse Scattering Transform, which is the key 
to a full understanding of the space of all solutions. (And finally, they probably 
never looked at Sine-Gordon as an evolution equation for a one-dimensional wave, 
so they didn't notice the strange scattering behavior of the solutions that they had 
calculated.) 

Nevertheless, their work did not go in vain. As soon as it was realized that 
Sine-Gordon was a soliton equation, it was natural to ask whether KdV also had 
an analogous theory of Backlund transformations that, starting from the vacuum 
solution marched up the soliton ladder. It was quickly discovered that this was in 
fact so, and while Backlund transformations have remained until recently one of the 
more mysterious parts of soliton theory, each newly discovered soliton equation was 
found to have an associated theory of Backlund transformations. Indeed this soon 
came to be considered a hallmark of the "soliton syndrome" , and a test that one 
could apply to detect soliton behavior. A natural explanation of this relationship 
follows from the Terng-Uhlenbeck Loop Group approach to soliton theory, and we 
will remark on it briefly at the end of this article. For full details see [TU2] . 

The Sine-Gordon equation has also been proposed as a simplified model for 
a unified field theory, and derived as the equation governing the propogation of 
dislocations in a crystal lattice, the propogation of magnetic flux in a Josephson 
junction transmission line, and many other physical problems. 
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The Nonlinear Schrodingcr Equation has an interesting pre-history. It was dis- 
covered "in disguise" (and then re-discovered at least three times, cf. [Ri]) in the 
early part of this century. In 1906, Da Rios wrote a master's thesis [DaR] un- 
der the direction of Lcvi-Civita, in which he modeled the free evolution of a thin 
vortex- filament in a viscous liquid by a time-dependent curve j(x, t) in R 3 satis- 
fying the equation j t — -f x x -f xx . Now by the Frenet equations, -f x x j xx = kB 
where k = n(x, t) is the curvature and B the binormal, so the filament evolves 
by moving in the direction of its binormal with a speed equal to its curvature. 
This is now often called the "vortex-filament equation" or the "smoke-ring equa- 
tion". In 1971, Hasimoto noticed a remarkable gauge transformation that trans- 
forms the vortex-filament equation to the Nonlinear Schrodingcr equation. In 
fact, if r(-,t) denotes the torsion of the curve "f(-,t), then the complex quantity 
q(x, t) — k(x, t) exp(i J r(£, t) d£) satisfies NLS if and only if 7 satisfies the vortex- 
filament equation. 

But it is as an "envelope equation" that NLS has recently come into its own. If a 
one-dimensional, amplitude modulated, high-frequency wave is moving in a highly 
dispersive and non-linear medium, then to a good approximation the evolution of 
the wave envelope (i.e., the modulating signal) in a coordinate system moving at 
the group velocity of the wave will satisfy NLS. Without going into detail about 
what these hypotheses mean (cf. [HaK]) they do in fact apply to the light pulses 
travelling along optical fibers that are rapidly becoming the preferred means of 
communicating information at high bit-rates over long distances. Solitons solutions 
of NLS seem destined play a very important role in keeping the Internet and the 
World Wide Web from being ruined by success. The story is only half-told at 
present, but the conclusion is becoming clear and it is too good a story to omit. 

For over a hundred years, analogue signals travelling over copper wires provided 
the main medium for point-to-point communication between humans. Early im- 
plementations of this medium (twisted pair) were limited in bandwidth (bits per 
second) to about 100 Kb/s per channel. By going over to digital signalling instead 
of analogue, one can get up to the 1 Mb/s range, and using coaxial cable one can 
squeeze out another several orders of magnitude. Until recently this seemed suffi- 
cient. A bandwidth of about 1 Gb/s is enough to satisfy the needs of the POTS 
(plain old telephone system) network that handles voice communication for the 
entire United States, and that could be handled with coaxial cable and primitive 
fiber optic technology for the trunk lines between central exchanges, and twisted 
pairs for the low bandwidth "last mile" from the exchange to a user's home. And 
as we all know, a coaxial cable has enough bandwidth to provide us with several 
hundred channels of television coming into our homes. 

But suddenly all this has changed. As more and more users are demanding 
very high data-rate services from the global Internet, the capacities of the com- 
munication providers have been stretched to and beyond their limits, and they 
have been desperately trying to keep up. The problem is particularly critical in the 
transoceanic links joining the US to Asia and Europe. Fortunately, a lot of fiber op- 
tic cables have been laid down in the past decade, and even more fortunately these 
cables are being operated at bandwidths that are very far below their theoretical 
limits of about 100 GB/s. To understand the problems involved in using these 
resources more efficiently, it is necessary to understand how a bit is transmitted 
along an optical fiber. In principle it is very simple. In so-called RZ (return-to- 
zero) coding, a pulse of high-frequency laser-light is sent to indicate a one, or not 
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sent to indicate a zero. The inverse of the pulse-width in seconds determines the 
maximum bandwidth of the channel. A practical lower bound for the pulse-width 
is about a pico-second (10 -12 seconds) giving an upper bound of about 1000 GB/s 
for the bandwidth. But of course there are further practical difficulties that limit 
data-rates to well below that figure (e.g., the pulses should be well-separated, and 
redundancy must be added for error correction) but actual data transmission rates 
over optical fibers in the 100 GB/s range seems to be a reasonable goal (using 
wavelength-division- multiplexing). 

But there are serious technical problems. Over-simplifying somewhat, a major 
obstacle to attaining such rates is the tendency of these very short pico-second 
pulses to disperse as they travel down the optical fiber. For example, if an approxi- 
mate square- wave pulse is sent, then dispersion will cause very high error rates after 
only several hundreds of miles. However if the pulses are carefully shaped to that of 
an appropriate NLS soliton, then the built-in stability of the soliton against disper- 
sion will preserve the pulse shape over very long distances, and theoretical studies 
show that error- free propogation at 10 GB/s across the Pacific is feasible with cur- 
rent technology, even without multi-plexing. (For further details and references see 
[LA].) 

4. Nonlinear Schrodinger as a Hamiltonian Flow 

Let G denote the group SU(2) of unitary 2x2 complex matrices of determinant 
1, and and Q its Lie algebra, su(2), of skew-adjoint complex matrices of trace 

0. The 3-dimensional real vector space Q has a natural positive definite inner 
product (the Killing form), defined by «a,b» = — ^tr(afc). It is characterized 
(up to a constant factor) by the fact that it is "Ad- invariant" , i.e., if g e G then 
« Ad(p)a, Ad(g)b» = «a, b», where Ad(g) : Q — » Q is defined by Ad(g)a = 
gag' 1 . Equivalently, for each element c of Q, ad(c) : Q —* Q defined by ad(c)a = 
[c, a] is skew-adjoint with respect to the Killing form: « [c, a] , 6»+ «a, [c, b] » = 
0. 

We denote by T the standard maximal torus of G, i.e., the group diag(e~ j6) , e l9 ) 
of diagonal, unitary matrices of determinant 1, and will denote its Lie algebra 
diag(— i0, i9) of skew-adjoint, diagonal matrices of trace zero. We define the specific 
element a of by a = diag(— i, i). 

The orthogonal complement, T , of in Q will play an important role in what 
follows. It is clear that T is just the space of "off-diagonal" skew-adjoint matrices, 

1. e., those with all zeros on the diagonal. (This follows easily from the fact that the 
product of a diagonal matrix and a "off-diagonal" matrix is again off-diagonal, and 

so of trace zero.) Thus T ± is the space of matrices of the form ^ ^_ ^ where 

q e C, and this gives a natural complex structure to the 2-dimensional real vector 
space T J ~. 

Note that is just the kernel (or zero eigenspace) of ad(a). Since ad(a) is skew- 
adjoint with respect to the Killing form, it follows that ad(a) leaves T invariant, 
and we will denote ad(a) restricted to T 1 ' by J : T — ► T 1 - . A trivial calculation 

shows that J \ ^ _ ^ ] = ( ^— ^ ) . 

\-q 0J \-2iq J 

Remark. In the generalization to SU(ra), we choose a to be a diagonal element 

of su(n) that is "regular", i.e., has distinct eigenvalues. Then the Lie algebra of 

the maximal torus T (the diagonal subgroup of SU(ra)) is still all diagonal skew- 
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adjoint operators of trace zero, and is again the null-space of ad(a). Its orthogonal 
complement, T , in su(n) is thus still invariant under ad(a), but now it is no longer 
a single complex eigenspace, but rather the direct sum of complex ad(a) eigenspaces 
(the so-called "root spaces"). 

We define the phase space P for the NLS Hierarchy by P = <S(R, T ), i.e., P 
consists of all "potentials" u that are Schwartz class maps of R into T : x i— ► 

u(x) = ( -/ \ ) ■ Clearly u i— > q establishes a canonical identification of 



-q(x) 

P with the space S(R, C) of all complex- valued Schwartz class functions on the 
line. We define an L 2 inner product on P, making it into a real pre-hilbert space, 
by (ui,U2) — J^° 00 «Ui(x),U2(x)» dx — — \ J^ ^ tr(ui(x)u2(x)) dx. When this 
is written in terms of q we find, (1*1,1*2) = Re(J^° q\{x)q2{x) dx). And finally, if 
we decompose qi and q 2 into their real and imaginary parts: qj = Vj + iwj, then 
(ui,u 2 ) = !°° (xl (viV2+ wiw 2 )dx. 

We "extend" J : T J ~ — » T J ~ to act pointwisc on P, i.e., (Ju)(x) = J(u(x)), and 
since J is skew- adjoint, we can define a skew bilinear form ft on P by 

Q(ui,U 2 ) = (j _1 Ul,U2) 

= Re (Lh qiT2dx 



qi 92 da; 



= T^Im ^ qiqldx 

Considered as a differential 2-form on the real topological vector space P, fl is 
constant and hence closed. On the other hand, since J : P — > P is injective, it 
follows that is weakly non-degenerate, and hence a symplectic structure on P. 

From the definition of SI we have f2(Jwi, u 2 ) — (tti, U2), thus if F : P — > P has 
a Riemannian gradient VF then fi( J(VF) Ul , 1*2) = ((V-F) Ul , W2) = dF Ul (u 2 ), and 
so V F = J VF. In particular, if Fi and F 2 are any two Hamiltonian functions on 
P then their Poisson bracket is given by the formula {Fi, F2} = f2(V s F2, V s Fi) = 
n(JVF2,VF 1 ) = (VF2,JVF 1 } = (JVF u VF 2 ). 

A Calculus of Variations functional on F, F : P — ► R, will be of the form 
F(m) = F(u, v x ,w x , . . . ) dx, where g = w + iw, and the differential of F is 
given by dF u (Su) = (^£5v + j^Sw) dx, or cquivalently 

/ P OO f fop $F\ 

dF u (8u) = - Re ( / ( — — h i— - ) (<5w - iSw) dx 

where as usual ^£ = ^ — (^§§^j + ^2 (^r~) — • • • > an d a similar expression for 
However, it will be more convenient to give the polynomial differential operator 
F as a function of q = u+iv, q = u— iv, q x = u x +iv x ,q x = u x — iv x , . . . instead of as 
a function of u, v and their derivatives. Since v = \{q + q) and w = 7^(9—9), by the 

chain-rule, ^€ = | ( ^ + «§^j, with similar formulas for e ^ c - Thus if we 
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define f = f - & (§(- ) + £ " ■ ■ ■ > ^ f = £ *** * follows 

that dF u (Su) — \ Re ^ ^(5gdx^ , where (5<j = (5u + i<ko, so <5u = ( ^- ^ 



/ 

Recalling the formulae for (u\,U2), it follows that VF U = -jp- q , and so 

\~W / 

/ 2i §\ 

V s F„ = — ■ Thus, expressed in terms of q, the Hamiltonian flow in 

P defined by .F 1 is q t = 2i^f - 

If we take F(u) = -±tr( U 4 + u 2 ) = ±(| g | 4 + | fe | 2 ), then g, « x ) = ±(<z 2 g 2 + 
fefe) and ^ = q 2 q+ \ jfc{.q x ) = {\q X x + \q 2 \q), and the Hamiltonian equation is 
qt = i(q xx + ^\q 2 \q), which is NLS. 

5. The Nonlinear Schrodinger Hierarchy 

For each potential u in P and complex number A we define an element A(u, A) of 

C°°(R, sl(2, C)) by A(u, A) = aA+it = . A) will play an important 

role in what follows, and you should think of it as a as a zero-order differential 
operator on C°°(R, gl(n, C)), acting by pointwise multiplication on the left. We 
are now going to imitate the construction of the KdV Hierarchy. That is, we will 
look for a sequence of maps u i— » Bj(u, A) of P into C°°(R, sl(2, C)) (polynomials 
of degree j in A) such that the sequence of ZCC Lax Equations u t = [Bj ,-^ — A] is 
a sequence of commuting Hamiltonian flows on P, which for j = 2 is the NLS flow. 

NLS Hierarchy Theorem. For each u in P there exists a sequence of smooth 
maps Qk(u) : R — > su(2) with the following properties: 

a) The Qk(u) can be determined recursively by: 

i) Qo(u) is the constant matrix a. 

ii) [a, Qfc+i (u)] = (Qk{u))x + [Qk(u),u], 
Hi) {Qk(u)) x + [Qk(u),u] is off-diagonal. 

b) If we define Bj(u, A) = X)fc=o Qk( u )^ k ~ : ' , an d consider Bk(u, A) as a zero- 
order linear differential operator acting by pointwise matrix multiplication 
on elements tp of C°°(R, gl(2, C)), then the conditions ii) and Hi) of a) are 
equivalent to demanding that the commutators [Bj(u, A), ^ — A(u, A)] are 
independent of A and have only off-diagonal entries. In fact these commu- 
tators have the values: 

[Bj(u, A), J| - A(u, A)] = [a, Q j+1 (u)} = (Q 3 (u)) x - [u, £,•(«)]. 

c) The matrix elements of Qk (u) can be determined so that they are polynomi- 
als in the derivatives (up to order k — 1) of the matrix entries of u, and this 
added requirement makes them uniquely determined. We can then regard Qk 
as a map of P into C°°(R, su(2)). Similarly, for each real \, u Bj(u, A) 
is a map of P into C°°(R, su(2)). 

d) If follows that the sequence of ZCC Lax Equations, — A) t = [Bj, — 
A], (or equivalently ut = [a, Qj+i(u)]) determine flows on P, the so-called 
higher order NLS flows. (The j-th of these is called the j-th NLS flow and 
the second is the usual NLS flow). 

c) If we define Hamiltonians on P by Hk{u) = — jrpj tr(<3fc +2 (w)a) dx, 
then (VHk)u is the off-diagonal part of Qk+i(u). 
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f) It follows that the j-th NLS flow is Hamiltonian, and in fact is given by 
u t = {V s H k ) u . 

g) The Hamiltonian functions Hk are in involution, i.e., the Poisson brackets 
{Hk,Hi} all vanish, so that all the NLS flows on P commute. 

Remark. We will give part of the proof of this important theorem here, and finish 
the proof later when we have developed more machinery. However first we comment 
on the changes that are necessary when we go from 2 to n dimensions, (i.e., replace 
gl(2, C) by gl(n, C), and su(2) by su(n)). In fact, surprising few changes are 
necessary. The maximal torus T still consists of diagonal unitary matrices of trace 
1 but now has dimension (n — 1) rather than 1. We replace a by any regular 
element of T (i.e., one with distinct elements on the diagonal). This is equivalent 
to the key condition that T is the commutator of a. The biggest change is that 
to get the family of commuting Hamiltonian flows we must now choose a second 
element b of T, and replace Qj(u) = Q a j(u) by the more general Qh,j{u), and the 
Bj(u,X) = B a j(u,X) by the more general Bbj(u, A) = J2 3 j=o Qb,k{u)X k . The 
only further change is that i) of a) now reads "Qb,o{ u ) is the constant matrix b." 
Mutatis mutandis, everything else remains the same. For full details, see [Sa]. 

Proof. Some easier parts of the proof will be indicated here, while other more 
difficult steps will be deferred until after we discuss the ZS-AKNS direct scattering 
theory, at which point they will be much easier to demonstrate. 

The coefficient of A J ~ fe in the commutator [Bj(u, A), ^ — A(u, A)] is easily com- 
puted, and for k = to j — 1 we find — (Qk{u))k — [Qk(u),u] — [Qk+i(u),a], while 
for k = j (i.e., the term independent of A) we get — (Qj(u)) x — [Qj(u),u], and c) is 
now immediate. 

If we write Qk(u) as the sum of its diagonal part, Tk(u), and its off-diagonal 
part, Pk(u), then since ad (a) annihilates diagonal matrices and is an isomorphism 
on the off-diagonal matrices, 

[a,Q fc+ i(u)] = ad(a)(T fc+ i(u))+ad(a)(i\+i(«)) = ad(a)(P fe+1 (u)), 
so by ii) of a): 

P k +i(u) = ■id(ar 1 ((Pk(u)) x + [T k (u),u]). 

(We have used the fact that, since u is off-diagonal, [u, Tk(u)] is off-diagonal while 
[u,Pk{u)] is diagonal.) 

Next note that condition iii) of statement a) can now be written as (Tj(u)) x = 
[u,Pj{u)\ (because [u,Pj(u)] is diagonal while [u,Tj(u)] is off-diagonal). So we can 
write x 

T k+ i{u)= j [u,P k+ i{u)]dx, 

where of course the indefinite integral is to be taken matrix element by matrix 
element. Together, the latter two displayed equations give an explicit recursive 
definition of Q k +i = Pk+i + T k +\ in terms of Q k = P k +T k . 

For example, since Qo(u) = a we conclude that Po(u) = and Xb(u) = a - Then 
the formula for Pk+i gives P\{u) — ad(a) _1 (0 + [a, it]) = u, and since [u,u] = 0, 
the formula for Tk+\ gives T\{u) = 0, and therefore Q\{u) = P\{u) = u. 

Continuing, we find next that Pz(u) = ad(a) -1 (u x ) = ( P_ ^ x J , and 
(T 2 (u)) x = [u, P 2 (u)} = ( ^ 9 ~ + qqx) _ h(q ] i+ q - x) ) , which gives by Integra- 
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tionr a („)=(W _^ !2 )andQ 2 (.) = P 2 (.) + T 2 (.)=(^| 2 j|j 2 ). 

(By what we have seen earlier, this shows that the second flow is indeed the NLS 
flow). 

We could continue for another several steps, and at each stage, after computing 
Pj(u) and then [Pj(u), it], the anti-derivative of the latter turns out to be in<S(R, T), 
so Qj(u) = Pj(u) + Tj(u) is in <S(R, su(2)). (Note that this is clearly equivalent to 
the statement that [u, Pk+i(u)] dx = 0.) 

Unfortunately no one has come up with a simple inductive proof of that fact, so 
at this stage we are faced with the unpleasant possibility that our recursive process 
might lead to some Tj(u) (and hence Qj(u)) that does not vanish at infinity. Later 
on, after we have discussed the scattering theory for the ZS-AKNS Scheme, we will 
find a simple argument to show that this cannot happen, and at that point we 
will have a proof of statements a) through d). Similarly, I do not know a proof of 
statement e) that avoids scattering theory so I will again defer the proof. 

Recalling that ad(a), (i.e., bracketing with a) annihilates diagonal matrices, it 
follows from e) that V s H k = J(VH k ) — [a, Qk+i], and so by d) the j-th NLS flow 
is given by u t = (V s H k ) u , which is f). 

For g), recall {H k , Hi } = (JVH k ,VHi) = ([a,Q k+1 (u)],Q l+1 (u)), and using 
this formula, the ad-invariancc of the Killing form, and the recursion relation 
[a, Qj+i (u)] = (Qj(u)) x — [u, Qj (u)] , we will give an inductive argument that the 
H k are in involution. 

Lemma 1. 

a) ([u,Qj(u)],Q k (u)) + (Qj(u), [u,Q k (u)}) = 0. 

b) ([u,Q j (u)],Q j (u)) = 0. 

c) ((Qj(u)) x , Q k {u)) + (Qj(u), {Q k {u)) x ) = 0. 

d) ((Q j (u)) x ,Q j (u))=0. 

e) {H j ,H j _ 1 } = 0. 

Proof. Statement a) is just a special case of the ad invariance of the Killing form, 
and b) is a special case of a). 

Recalling that < u\, u 2 >= — tr(ui, u 2 ) dx, it follows that 

f°° d 

{(Qj{u))x,Qk(u)) + (Qj{u),(Qk{u))x) = ~ J —tr(Qj(u),Q k (u))dx, 

which is clearly zero since tr(Qj(u), Q k (u)) vanishes at infinity. This proves c), and 
d) is just a special case of c). 

Since {Hj, Hj_\} = ([a.,Qj+i(u)],Qj(u)), the recursion formula for [a, Qj+i (u)] 
gives {Hj,Hj_i} = ((Qj(u)) x ,Qj(u)) — ([u, Qj(u)],Qj(u)), and e) now follows from 
b) and d). ■ 

Lemma 2. {H k ,Hi} = - {H k _ u H l+1 }. 

Proof. {H k ,Hi} = ([a,Qk+i(u)],Qi+i(u)), so that using the recursion formula for 
[a,Q fe+ i(u)] we find: 

{Hk, Hi} = ({Qk(u)) x ,Ql+i(u)) - ([u,Qk(u)],Ql+i{u)), 
and using a) of Lemma 1, 

{H k , Hi} = ((Q k (u)) x ,Qi +1 (u)) + ((Qfc(u), [u, Qi+i(u)]). 
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Next, using the recursion formula for [a, Qi+2{u)\, we find that {Hk, Hi} = 
{{Qk(u))x,Ql+i(u)) + ((Q k (u),(Qi +1 (u)) x ) - ((Q k (u),[a,Qi +2 (u)]), and we recog- 
nize the third term as — {Hk+i, while the sum of the first two terms vanishes 
by c) of Lemma 1. ■ 

The proof that {Hk,H{} = for any k and I is now easy. We can suppose that 
k > I, and we apply Lemma 2 repeatedly, decreasing the larger index by one and 
increasing the smaller by one, until we "meet in the middle" . At this point we have 
an identity {Hk, Hi} = ± {H m ,H n } where to = n if k and I have the same parity, 
while to = n + 1 if the have opposite parity. In the first case we get {Hk, Hi} = 
by the anti-symmetry of Poisson Brackets, and in the second case {Hk, Hi} = by 
e) of Lemma 1. 

This finishes our partial proof of the NLS Hierarchy Theorem; we will complete 
the proof later. 

6. ZS-AKNS Direct Scattering Theory 
1. Statements of Results 

For each potential u in our phase space <S(R, T J ~) we would like to define scattering 
data, by which we will mean a measure of the asymptotic behavior of solutions of 
the parallel transport equation, ip x = A(u, \)ip = (aA + u)ip, for x near ±oo. Of 
course, to have a useful Inverse Scattering Method, the scattering data for u must 
be such that it allows us to recover u. On the other hand, it is preferable to make 
the scattering data as simple as possible, so it should be "just enough" to recover u. 
Direct Scattering Theory refers to this search for such good minimal scattering data, 
and for the explicit determination of the image of the Direct Scattering Transform, 
(the map from u e <S(R, T J ~) to the scattering data of u). Identifying this image 
precisely is of course essential for a rigorous definition of the Inverse Scattering 
Transform that recovers u from its scattering data. 

It turns out that, in discussing the asymptotic behavior of solutions tjj of the 
parallel transport equation near infinity, it is more convenient to deal not with ip 
itself, but rather with the related function <ft = ip(x)e~ aXx , which satisfies a slightly 
modified equation. 

Proposition 1. Ifip and <p are maps o/R into SL(2, C) that are related by <f>(x) = 
ijj(x)e~ aXx , then ip satisfies the parallel transport equation, ip x — (aA + u)ip, if 
and only if <p satisfies what we shall call the "modified parallel transport equation", 
4> x = [aA, 4>] + u<j). 

Proof. Clearly cp x = tp x e~ aXx — ipe~ aXx a.X = (aA + u)ipe^ aXx — <j>a\, and the result 
follows. I 

Definition. For u in S(H,T J ~), we will call m u (x, A) a normalized eigenfunction 
of u with eigenvalue A if it satisfies the modified parallel transport equation, m" = 
[aA, m u ] + um u , and if in addition: 

1) lim^-oo m u (x, A) = I. 

2 ) su PxeRll m "( a; ' A )ll < 00 
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It is these normalized eigenfunctions m u that will play the role of scattering data 
in this theory; they are analogous to the Jost solutions of the Schrodinger equation 
in the KdV theory. Note that condition 2) just means that each matrix element of 
m u (x, A) is a bounded function of x. 

A complete theory of normalized eigenfunctions will be found in [BC1]. We will 
next state the basic results proved there as three theorems, Theorem A, Theorem 
B, and Theorem C, reformulating things somewhat so as to make the statements 
better adapted to the Terng-Uhlenbeck version of inverse scattering theory that we 
will explain later. Then we will sketch the proofs of these results, leaving it to the 
interested reader to fill in many of the details from the original paper of Beals and 
Coifman. 

We will denote S(H,T ± ) by P in what follows. 

Theorem A. For each u in P there is a unique normalized eigenfunction m u (x, A) 
for u with eigenvalue X, except for A in R U D u , where D u is a bounded, discrete 
subset of C \ R. Moreover, as a function of X, for each fixed x in R ; m u (x, A) is 
meromorphic in C \ R with poles at the points of D u . 

Note that a matrix- valued function of a complex variable is said to be holomorphic 
(resp., meromorphic) in a region O if each of its matrix elements is holomorphic 
(resp., meromorphic) in O, and a pole of such a function is a pole of any of its 
matrix elements. 

Definition. An element u of P will be called a regular potential if D u is a finite set 
and if, for all real x, the function m u (x, A) with A in the upper half-plane C + has 
smooth boundary values m"(x,r) on the real axis, and similarly m u (x, X) with A 
in the lower half-plane C_ has smooth boundary values m"(x,r). We will denote 
the set of regular potentials by P . 

Theorem B. The space P of regular potentials is open and dense in the space 
P = 5(R, T ) of all potentials. 

It is an essential fact that the normalized eigenfunctions m u (x, A) have asymp- 
totic expansions as |A| tends to infinity. Since the precise nature of these expansions 
will be important, we will give the relevant definitions in some detail 

A matrix- valued function /(A) defined for complex A with |A| sufficiently large 
is said to have an asymptotic expansion at infinity if there exists a sequence of 
matrices /„ so that /(A) — ^2j =0 fjX~i = o(|A|~ fc ). It is easy to see inductively 
that the /„ are uniquely determined, and we write / ~ J2j fj^~*- 

Now suppose that we have matrix- valued functions f(x,X), defined for all x 
in R and all A in C with |A| sufficiently large. Suppose that we have matrix- 
valued functions f n {x) such that for each x, f(x, A) <~ . fj{x)X~^. We will write 
/ <~ R J2j fj^ ^ if t ms asymptotic expansion holds uniformly in x, i.e., if 



It is easy to explain the importance of the uniformity. Suppose / and the /„ are 
diffcrentiable functions of x. Then the uniformity gives 



k 



sup f{x,X)-Y,fA^- i =o{\X\- k ). 
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and letting Ax approach zero gives ^ ~ B J2j fj^ J '> i- e -> we can differentiate such 
an asymptotic relation "term by term" . 

Theorem C. For u in P B , the normalized eigenf unctions m u (x,X) have an as- 
ymptotic expansion as X tends to infinity, m u ^ R ^•toJA - - 7 '. In fact the to" 
are uniquely determined inductively by the condition [a, mj +1 (x)] — j^mj(x) — 
u(x)nij(x). 

The normalized cigcnfunctions, m u {x, A), satisfy a simple relation, referred to as 
the "reality condition" that follows as an easy consequence of the fact that u(x) 
takes its values in su(2). 

Proposition 2. If u £ P then the normalized eigenfunctions m u satisfy the rela- 
tion m u (x, X)*m u (x, A) = I. 

So, passing to the limit as A e C + approaches r e R, 

Corollary. m v i(x,r)*m^(x,r) = I. 

We will need one more property of the m u (or rather of their boundary values, 
ml). 

Proposition 3. Let u e P and x g R, and let m"(a;, r) = g(x,r)h(x,r) be the 
canonical decomposition of m"(x, r) into the product of a unitary matrix g(x,r) 
and an upper-triangular matrix h(x, r). Then h(x, r) — I is of Schwartz class in r. 

2. Outline of Proofs 

As was the case for the scattering theory for the Schrodinger operator, it is a lot 
easier to see what is happening for the special case of potentials with compact 
support. It turns out for example that all such potentials are regular. Below we 
will give most of the details of the proofs of Theorems A, B, and C for the 2x2 
case when u has compact support. 

[In [BC1], the case of compactly supported potentials is considered first, followed 
by the case of "small potentials", i.e., those with L 1 norm less than 1. For the latter, 
it turns out that existence and uniqueness of the to" can be proved easily using 
the Banach Contraction Principle, and moreover it follows that D u is empty. The 
case of regular potentials (called "generic" in [BC1]) is then handled by a limiting 
argument. [BC1] also consider the general nx n case and does not assume that u is 
necessarily skew-adjoint. This latter generality adds substantial extra complexity 
to the argument.] 

In any interval [a, b] in which u vanishes identically, the modified parallel trans- 
port equation reduces to the Lax Equation <j> x = [aA, <j>], so choosing an arbitrary 
x in [a, b], the solution is cf>(x) = e* x ( x ~ x °) (j)(xo)e~ a - x( - x ~ x °\ or <f>(x) = e aXx se~ aXx , 
where we define s = e~ aXx °<j>(xo)e &Xx ° . This proves: 

Proposition 4. Suppose u in P has compact support, say u(x) = for \x\ > M . 
Then for each complex number X there is a unique solution 4> w (x, A) of the modified 
parallel transport equation with <p u (x,X) = I for x < —M. Moreover, for x > M, 
(j) u has the form <j>™x.X) = e aXx s u (X)e- aXx (where s u (X) = e~ aXM 4> U (M , X)e aXM ) , 
and for each real i, Ah 4> u {x, X) is an entire function (i.e., holomorphic in all of 
C). 

The fact that <j) u is holomorphic in A is a consequence of the more general princi- 
ple that if an ODE depends analytically on a parameter A, then the solution of the 
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equation with some fixed initial condition is analytic in A. (In this case the initial 
value condition is <j> u (— M, A) = I.) 

Definition. We will denote the matrix elements of s u (X) by s^-(A), and we define 
D u to be the set of all A in the upper half-plane that are zeroes of union the 
set of all A in the lower half-plane that are zeroes of s 22 . 

Remark. It can be shown that the holomorphic functions s\ 2 and s 21 are not 
identically zero, so that D u is a discrete set. In fact (cf. [BC1], section 4), D u is 
finite, and neither nor s 22 has any zeroes on the real axis. 

Proposition 5. Suppose u in P has compact support. For each A e C \ (RU D u ) 
there is a unique normalized eigenf unctions m u (x,X). For every x in R ; m u (x, A) 
is a meromorphic function of A for A in C \ R, with poles at the points of D u . 
Finally, the restriction of m u (x, A) to each half-plane has a smooth extension to the 
real axis. 

Proof. Since <fi u (x, A) is invertible, there is no loss of generality in assuming that a 
normalized eigenfunction has the form m u (x, A) = </> u (x, \)\ u (x, A). Then [aA, m u ] + 
um u — m" = (f)"x u + <t>Xx' which simplifies to the same Lax Equation as be- 
fore, namely \x — [aA, x"], but now valid on the whole of R, and it follows that 
X u (x, A) = e aA:E x"(A)e- aA:E , and hence m u (x, A) = <j> u (x, X)e aXx X u {X)e~ aXx . 

Then, by Proposition 4, for x < -M, m u (x,X) = e aXx x u {X)e~ aXx while for 
x > M, m u (x 1 X) = e aXx s u {X) X u (X)e- aXx . 

Let us write Xy(A) f° r the matrix elements of x"(A), and try to determine 
them individually so that Conditions 1) and 2) of the definition of generalized 
eigenfunctions will be satisfied for the resulting m u (x, A). 

Note that, since conjugating x u W by a diagonal matrix does not change its 
diagonal entries, the diagonal elements of m u (x, A) are just Xii(A) an d X22W 
for x < —M. Since Condition 1) requires that m u (x, A) converge to the identity 
matrix as x approaches —00, it follows that we must take Xii(A) — X22W — ^» an( i 
conversely with this choice Condition 1) is clearly satisfied. 

On the other hand, an easy calculation shows that the off-diagonal elements, 
mi 2 (x,X) and m\\(x, A), are given respectively by e~ 2zXx Xi2W an d e 2iXx 'X21W 1 
when x < —M. If A = a + it, m\ 2 (x,X) = e - 2iax e 2rx Xi 2 (A) , and m^{x,X) = 
e 2z(7X e~ 2TX X2iW- Since Condition 2) requires that these remain bounded when x 
approaches —00, it follows that when A is in the lower half-plane (i.e., r < 0) then 
X12M = 0> an d similarly, X21W = for A in the upper half-plane. 

Next, take x > M, so that m u (x,X) = e aXx s u (X)x u {X)e- aXx . Then another 
easy computation shows that if A is in the upper half-plane, then m" 2 (a;, A) = 
e -2iAx( s « i ( A - )x « 2 ( A ) + S « 1 (A)), while m\ 2 {x, X) = 0. Since m" 1 (x, A) = s^(A) and 
m 22 (x, A) = s 22 (X) arc independent of x, the condition for m u {X) to remain bounded 
when x approaches +00 is just s" 1 (A)xi2(A) + s" 2 (A) = 0, and this uniquely de- 
termines Xi2(A), namely Xi2(A) = — s i2(A)/s"i(A). So for A in the upper half- 
plane x"(A) = s i2(A)/sii(A) ^ ^ ^ e unique choice of x" satisfying Condi- 
tions 1) and 2). A similar computation shows that for A in the lower half-plane 



explicit formulas and the fact that s\\ and S22 have no zeroes on the real axis. ■ 
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Lemma. If ip x — Aip and <fi x — —(j>A then (pip is constant. 

Proof. OV)* = 4>xfy + Hx = 0. ■ 

We can now prove Proposition 2. 
Proof. It will suffice to prove that m u (x , X)* m u (x , X) is constant, since we know 
that as x approaches — oo the product converges to /. If we define ip(x, X) = 
m u (x, X)e aXx , then ip(x, A)* = e~ aXx m u (x, A), and therefore m u (x, X)*m u (x, X) = 
ip(x,X)*ip u (x,X) and it will suffice to prove that ip(x 1 X)*ip u (x 1 X) is constant. By 
Proposition 1, ip x (x, X) = (aA + u)ip(x, A). Since u* = —u and (aA)* = — aA, 
ip x (x, A)* = ip(x,X)*(aX + u)* = —ip(x,X)*(aX + u), and the preceding lemma 
completes the proof. ■ 

Our Theorem C is just Theorem 6.1, page 58, of [BC1]. While the proof is 
not difficult, neither is it particularly illuminating, and we will not repeat it here. 
Similarly, our Proposition 3 follows from Theorem E', page 44 of [BC1]. 

This completes our discussion of the proofs of Theorems A, B, C, and Proposi- 
tions 2 and 3. In the remainder of this section we will see how these results can be 
used to complete the proof of the NLS Hierarchy Theorem. 

Since m u (x,X)~ 1 = m u (x,X)* 1 it follows that m u (x, X)a(m u (x 1 A)) -1 has an 
asymptotic expansion. 

Definition. We denote the function m u (x, X)a(m u (x, X))~ x by Q u (x, A). 
So by the preceding remark, 

Corollary. Q u (x, A) has an asymptotic expansion Q u <~ fi YlJLo Qj^ - " 7 ; Qo = 
a, hence also Q u x ~ R £°1 (<2" ) x X j 

Lemma. // we define ip(x, A) = m u (x, X)e aXx then Q u (x, A) = tpatp^ 1 . 
Proof. Immediate from the fact that all diagonal matrices commute. ■ 

Now (ipaip~ 1 ) x — tp x aip~ 1 + ij}a('4>~' 1 ) Xl and by Proposition I, tp x = (aA + 
u)ip. Also, from ipijj^ 1 — I we get ipxip -1 + ip(ip~ 1 ) x = 0. Combining all these 
facts gives (V>aV> _1 ):r = [aA + u, tpatp^ 1 }, and hence, by the lemma, Q x (x,X) = 
[aA + u, Q u (x, A)]. If we insert in this identity the asymptotic expansion Q u ~ B 
J2"jLo Q'j^ we nn( i a second asymptotic expansion for Q x (x, A), in addition to the 
one from the above Corollary, namely Q" ^ R J2j(i a > Q]+i] + [ u i Qj])^- Therefore, 
by uniqueness of asymptotic expansions we have proved: 

Proposition 6. The recursion relation (Qj ) x — [a, Qj + i] + [u, Q™\, is satisfied 
by the coefficients QJ of the asymptotic expansion of Q u (x,X), and hence they 
are identical with the functions Qj{u) : R — > su(2) defined in the NLS Hierarchy 
Theorem. 

We are now finally in a position to complete the proof of the NLS Hierarchy 
Theorem. 

Since a 2 = /, it follows that also Q u (x,X) 2 = (m u a(m u )^ 1 ) 2 = I, and hence 
^ ~ (Y^jLo Qj A- 5 ) 2 - Expanding and comparing coefficients of A~ fc , uniqueness of 
asymptotic expansions gives aQk(u) + Qk(u)a = —^2j = iQj{u)Qk-j{u). Recall 
that we needed one fact to complete the proof of statements a) through d) of 
the NLS Hierarchy Theorem, namely that if Qk(u) = Pk(u) + Tk(u) is the de- 
composition of Qk(u) into its off-diagonal part and its diagonal part, then the 
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matrix elements of Tk(u) are polynomials in the matrix elements of u and their 
derivatives. Moreover, we saw that we could assume inductively that this was 

true for the matrix elements of Qj(u) for j < k. But if Tfe = ^ )' ^ en 

aQfc(w) + Qk(u)a = —2iTk = ^ ^* fc ^ ano ^ the desired result is now im- 

mediate from the inductive assumption. 

The other statement of the NLS Hierarchy Theorem that remains to be proved 
is e). 

Define a function F u (x, A) = tr(Q u (x, A)a). Clearly F u (x, A) has an asymptotic 
expansion, F u ~ R . F"A~ J , where F" = tr(Q"a). 

From what we have just seen, F u (x, A) is Schwartz class in x, so we can define 
a map F(u, A) = F u (x, A) rfx = tr(Q u a) dx, and F(u, A) - £\ Fj(u)\-' 
where F (u) = FV(x) dx. 

If we consider u i— » F(u, A) as a function on P, then VP is a vector field on 
P, and (VF)„ - Ej^-^jM"^ We claim that statement e) of The NLS Hier- 
archy Theorem follows from the following proposition. (For a proof of which, see 
Proposition 2.4 of [Tc2].) 

Proposition 7. If v in P then 



d_ 



F(u + ev,X) = [°° tr ( ' ^ v(x)s) dx. 
e=o J_ 00 \ d\ J 



Indeed, expand both sides of the latter equality in asymptotic series in A, and com- 
pare coefficients of A" J '. Since dQ ^' X) = T,j -jQj^' 1 , wc find ( dF i)u(v) = 
tr((— (j — l)Qj-i(u)(x)v(x)a)dx. Recalling the definition of the inner prod- 
uct in P, we see —jzjiy Fj) u is the projection of Qj-\{u) on T ± , i.e., the off- 
diagonal part of <2j_i(u). So if we define Hj(u) = — jpj; tr((Qj+2(w)a) dx — 
--p^Fj +2 {u), then (VHj) u = - j^ T (VP 7+2 )ti is the off-diagonal part of Q j+ i(u), 
which is statement e) of The NLS Hierarchy Theorem. 

7. Loop Groups, Dressing Actions, and Inverse Scattering 
1. Secret Sources of Soliton Symmetries 

This article is titled "The Symmetries of Solitons" , and we have been hinting that 
many of the remarkable properties of soliton equations are closely related to the 
existence of large and non-obvious groups of symplectic automorphisms that act on 
the phase spaces of these Hamiltonian systems and leave the Hamiltonian function 
invariant. We are now finally in a position where we can describe these groups and 
their symplectic actions. 

The groups themselves are so-called loop groups. While they have been around 
in various supporting roles for much longer, in the past three decades they have 
have been increasingly studied for their own sake, and have attained a certain 
prominence. See for example [PrS]. 

Given any Lie group G, we can define its associated loop group, L(G) as the 
group of all maps (of some appropriate smoothness class) of S 1 into G, with point- 
wise composition. For our purposes we will always assume that G is a matrix group 
and the maps are smooth (i.e., infinitely diffcrentiable) . 
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The theory gets more interesting when we regard the loops in G as boundary 
values of functions that are holomorphic (or meromorphic) in the interior (or exte- 
rior) of the unit disk and take subgroups by restricting the analytic properties of 
these analytic extensions. That is, we concentrate on the analytic extensions rather 
than the boundary value. 

Once we take this point of view, it is just as natural to pre-compose with a 
fixed linear fractional transformation mapping the real line to the unit circle, (say 
z i ^ (1 + iz)/(l — izj), so that elements of the loop groups become maps of R 
into G that are boundary values of certain analytic functions in the upper or lower 
half-plane, and this is the point of view we will adopt. Note that the above linear 
fractional transformation take —1 in S 1 to infinity, and for certain purposes it is 
important to know how the nature of the original map of S 1 into G at — 1 translates 
to properties of the transformed map of R into G at ±oo. A straightforward 
calculation gives the following answer: 

Proposition 1. ([TU], Proposition 7.7) Given g : S 1 — > GL(n, C), define <&(g) : 
R -> GL(n, C) by $(s)M = fl (±±f). Then: 

(i) g is smooth if and only if <&(<?) is smooth and has asymptotic expansions at 
+oo and at — oo and these expansions agree. 

(ii) g — I is infinitely flat at z — — 1 if and only if &(g) — I is of Schwartz class. 

(iii) g : C — > GL(n, C) satisfies the reality condition g(l)*g(z) = I if and only 

*($)(*) = satisfies $( 3 )(A)*$( 3 )(A) = /. 

The first, and most important, loop group we will need is called Z>_. The analytic 
properties of its elements are patterned after those proved to hold in the preceding 
section for the normalized eigenfunctions m u (x, A) as functions of A. 
Definition. We will denote by 2?_ the group of all meromorphic maps / : C \ R — > 
GL(n, C) having the following properties: 

1) /(A)*/(A) = /. 

2) / has an asymptotic expansion /(A) ~ I + fiX^ 1 + f2^~ 2 H • 

3) The set of poles of / is finite. 

4) / restricted to the upper half-plane, C+, extends to a smooth function on 
the closure of the upper-half plane, and similarly for the lower half-plane. 
The boundary values are then maps f± : R — > GL(n, C), and by 1) they 
satisfy f + (r)*f.(r) = I. 

5) If f+(r) = g(r)h{r) is the factorization of f+(r) as the product of a unitary 
matrix g(r) and an upper triangular h(r), then h — I is of Schwartz class. 

Definition. We define a map, T t : Pq — > £>_, the Scattering Transform, by 
£»(A) = /«(A) = m«(0,A). 

That m"(0, A) is in fact an element of 2?_ is a consequence of the definition of the 
set Po of regular potentials, and Theorems A and C and Propositions 2 and 3 of 
the preceding section. There is nothing special about in the above definition. We 
could have equally well chosen any other fixed real number x and used m u (x , A) 
instead of m"(0, A). 

2. Terng-Uhlenbeck Factoring and the Dressing Action 

There are three other loop groups that play an essential role in the definition of the 
Inverse Scattering Transform, 2F at , and we define these next. 
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Definition. We will denote by Q + the loop group of all entire functions h : C — > 
GL(n, C), and by H+ the abelian subgroup of G+ consisting of all elements of the 
form e aP ^ where P : C — > C is a polynomial in A. Finally, we define 7i_ to be the 
subgroup of 2?_ consisting of those elements / taking values /(A) in the diagonal 
subgroup of GL(n, C). For each x in R we define e a (x) in H+ by e a (x)(A) = e aXx , 
and for each positive integer j we define a one-parameter subgroup e aj of H + by 
e a j(t) = e aXH . (Note that e a {x) = e a ,i(a;).) 

The following theorem is one of the basic results of [TU]. As we shall see, it provides 
an alternative, group theoretic approach to ZS-AKNS Inverse Scattering Theory. 
(In fact, conversely, it can be proved using earlier approaches to ZS-AKNS Inverse 
Scattering Theory). 

Terng-Uhlenbeck Factoring Theorem. ([TU], 7.11 and 7.16) /// e V- then: 

1) for any h e H+, hf^ 1 : C \ (RUD^) — ► GL(n, C) can be factored uniquely 
in the form hf^ 1 = M~ 1 E, with M in 2?_ and E in Q + . 

2) Taking h = e^i(x) in 1) (i.e., h(X) — e aXx ), we get a one parameter family 
of such factorings, e a ,i(aj)/ = M^ 1 (x)E(x) and, writing E x for the de- 
rivative of E, it follows that E x = {a.\ + u)E for a unique, regular potential 
u in Pq. 

We note that in 1) uniqueness is easy and only existence of the decomposition needs 
proof. Indeed, uniqueness is equivalent to the statement that T>^P\Q+ = I, and this 
is immediate from from Liouvillc's Theorem that bounded holomorphic functions 
are constant (recall that elements of 2?_ converge to I as A — > oo). The existence 
part of 1) follows from the two classical Birkhoff Decomposition Theorems, and 
statement 2) gives the dependence of this factorization on the parameter x. 
Definition. We define a left action of H+ on D_, called the dressing action, and 
denoted by (h, /) i— > ft * /. It is defined by h * / = M, where M is given by the 
factoring of ft./ -1 in 1) of the previous theorem. 

Of course we must check that (ft.1/12) * / = h\ * (h 2 * /), but this is easy. Suppose 
h 2 ,f~ 1 = M 2 7 1 E2, i.e., ft 2 * / = M 2 , and use the factoring theorem again to write 
/iiM 2 -1 as a product, hiM^ 1 = Mf 1 ^, i.e., h\ * M 2 = M x . Then {hih 2 )f~ 1 = 
hi(h 2 f- r ) = h 1 M 2 1 E 2 = M^E X E 2 , so \h\h 2 ) *f = M x =hi* M 2 = hi* (h 2 * f). 

Now that we have an action of H+ on V- , it follows that every one-parameter 
subgroup of H+ defines a flow on 2?_ . In particular the one-parameter subgroups 
e a ,j define an important sequence of flows on D_ . 

Definition. For each positive integer j we define a flow on D_, called the j-th 
flow, by (t,f) 1 ^ e a j(t) * f. 

Of course, since Ti.+ is an abelian group and all the e aj are one-parameter subgroups 
of 7i+ , it follows that this sequence of flows all mutually commute. 

3. The Inverse Scattering Transform 

We are now in a position to define the Inverse Scattering Transform. 
Definition. We define a map 2F at : 2?_ — > Pq, called Inverse Scattering Trans- 
form, by associating to / in V- the regular potential u — XF at (/) in P given 
by 2) of the Terng-Uhlenbeck Factoring Theorem. That is, if we define ip(x, A) = 
(e & (x) * f)(A)e aXx , then u is characterized by the fact that ip satisfies the parallel 
transport equation with potential u, ip x = (aA + u)ip. 
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Theorem D. The maps T M : Pq — ► V- and 2F cat : V- — > Po satisfy: 

a) IF . o J, = identity. 

/ scat scat «/ 

b) ./•: . - //•• .•:/- .w . 

Taws, ffte map P - * 2?- - * T>-/H- that is the composition o/^ at and f/ie natural 
projection ofV- onT>-/H- is a bijection. 

Recall that in the NLS- Hierarchy Theorem we defined a sequence of flows on Po, 
the j-th of which we also called the "j-th flow". As you probably suspect: 

Theorem E. ([TU] Theorem 8.1) The transforms £ at : P -> X>_ and 2F at : 
2?_ — » P are equivariant with respect to the j-th flow on V- and the j-th flow 
on Pq. In particular if u{t) in P is a solution of the j-th flow, then jF^(u(t)) = 
eaj(t)*£ t (u(0)). 

Corollary. The following algorithm finds the solution u(x,t) for the j-th flow in 
Po with initial condition u(0) = u(x,0): 

1) Compute the parallel translation operator tp(x,0,\) having the correct as- 
ymptotic behavior. That is, solve the following linear ODE problem: 

a) ip x {x, 0, A) = (aA + u(x, 0))ip(x, 0, A) 

b) lim^_ oo V(x,0,A)e- aAa; =1. 

c) ip(x, 0, A)e~ aA:E is bounded. 

2) Define f in 2?_ by /(A) = ^(0, 0, A). 

3) Factor e a j(t)e a i(x)f~ 1 as M(x,t)~ 1 E(x,t) where M(x,t) G 2?_ andE(x,t) G 

Q+- 

4) Then, putting ip(x, t, A) = M(x, t)(\)e &Xx+XH , 

u(x, t) — ip x {x, t, A)^ _1 (x, t, A) — aA. (The RHS is independent of X.) 

Proof. This just says that u(t) = IF at (e aj (t) * £ at (u(0))). ■ 
4. ZS-AKNS Scattering Coordinates 

An important ingredient of the KdV Inverse Scattering Method, based on the 
Schrodinger operator, was the that the "coordinates" of the scattering data evolved 
by a linear ODE with constant coefficients, and so this evolution could be solved 
explicitly. Recall that this allowed us to derive an explicit formula for the KdV 
multi-solitons. Such scattering coordinates (or "action-angle variables") also exist 
for the ZS-AKNS Hierarchy, and even for the more general n x n systems, but the 
story is somewhat more complicated in this case and we will only outline the theory 
here and refer to [ZS] and [BS] for more complete descriptions. 

Another advantage of the loop group approach is that it permits us to factor the 
scattering data into discrete and continuous parts. To a certain extent this allows 
us to discuss separately the scattering coordinates and evolution of each part. 

disc cont 

Definition. We define two subgroups, V _ and V_ , of D_, by 

cont 

V_ ={/ G 2?- | / is holomorphic in C \ R}, and 

disc 

T>_ ={/ G I?- | / is meromorphic in C} 

Remark. Since elements of 2?_ approach I at infinity, it follows that any / in 
V_ is actually meromorphic on the whole Ricmann sphere, and hence a rational 
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function of the form fij(X) = Pij(\)/Qij(\), where the polynomial maps P^j and 
of Qij have the same degrees for a given diagonal entry, and Qij has larger degree 
for an off-diagonal entry. For this reason, T>_ is also referred to as the rational 
subgroup of £>_. Also, since / satisfies the reality condition, /A)*/(A) = / and is 
holomorphic on the real line, it follows that for r in R, f(r)*f(r) = I (i.e., / is 
unitary on R), and the boundary values / + of / from C+ and /_ from C_ are 
equal, so that the "jump", (r) = fZ 1 (r)f + (r) is the identity. 

Theorem F. (TU Theorem 7.5) Every f in T>_ can be factored uniquely as a 

cont cont 

product f = hg where h e T>_ and g e T>_ . In fact the multiplication map 

cont disc 

T>_ x T>_ — > V- is a diffeomorphism. 

Proof. This is an immediate consequence of Proposition 1 of the previous section 
and the following classical theorem of G. D. Birkhoff. ■ 

Birkhoff Decomposition Theorem. ([PrS], Theorem 8.1.1) 
Let L(GL(n, C)) denote the loop group of all smooth maps of S 1 into GL(n, C), 
nU(n) the subgroup of all smooth maps g of S 1 into U(n) such that g(— 1) = I, 
and L + (GL(n, C)) the subgroup of L(G~L(n, C)) consisting of all g that are the 
boundary values of holomorphic maps of the open unit disk into GL(n, C). Then 
any f in L(GL(re, C)) can be factored uniquely as a product f = gh where g € 
L + (GL(n, C)) and h e f!U(n). In fact the multiplication map L + (GL(n, C)) x 
OU(n) — > L(GL(re, C)) is a diffeomorphism. 

Definition. Given z e C and an orthogonal projection 7r in GL(n, C) we define 

disc _ 

g z ^ in V_ by g z ,*(\) =1+ fE^ir 

Theorem G. (Uhlenbeck [Ul]) The elements g z ^ for z e C\R generate the group 

disc 

XL . 

It follows easily from Theorem G and the Bianchi Permutability Formula ([TU] 

disc 

Theorem 10.13) that at each simple pole z of an element of T>_ we can define a 
"residue", which is just the image of a certain orthogonal projection, n. To be 
precise: 

Theorem H. If f £ V- and z is a simple pole of f , then there exists a unique 
orthogonal projection tt such that fg z \ is holomorphic at z. 

The set of / in D_ for which all the poles are simple is open and dense, and it 
is for these / that we will define "scattering coordinates" , . 

Definition. Given / in T>_ with only simple poles, the scattering coordinates of 
/, consists of the following data: 

a) The set — {zi, . . . , zn} of poles of /. 

b) For each z in , the "residue" of / at z, i.e., the image V/ of the unique 
orthogonal projection, n = tt( such that fg~\ is holomorphic at z. 

c) The jump function of /, i.e., the map : R — > GL(n, C) defined by 
vf{r) = fZ\r)f + {r). 

The following theorem describes the evolution of the scattering coordinates . 
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Theorem I. ([TU1]) If f{t) G V- evolves by the j-th flow and /(0) has only simple 
poles, then S?^ evolves as follows: 

a) £)/(*) =£)/(0), 

b) For z in , V z f(t) = e-» zit (V z f(0) ), 

c) ^W(r) = e ar3t ^'(°)(r)e- ar3 *. 

We next explain how to recover / e D_ from . To do this first write / = gh 

disc cont » -i , - -■ 

with geP_ and /i e £>_ . Then, = /_ /+ = {g-h-)' 1 (g+h+) = hZ h+ , since 
as we saw above, c/_ = <?+ . It follows from uniqueness of the Birkhoff decomposition 
that determines h- and h+ and hence h. (Recall that h in C + (respectively 
C_) is the unique mcromorphic extension of h + (respectively h-).) On the other 
hand, from the poles z of g and the residues ir[ of g at these poles we can recover 
g and hence / = gh. 

There is again an explicit formula for "pure solitons" , or "rcflcctionless poten- 

disc 

tials" (i.e., u e Pq such that /" is in T>_ ). We will content ourselves here with 
writing the formula for the 1-solitons of NLS, i.e., a single simple pole, say at 
z = r + is, with residue the projection of C 2 onto the vector (yT — \b\ 2 , 6), where 
teC with |6| < 1. Then the solution q(x, t) of NLS is: 

TsOyT - |&|2 e (-2ir*+(r 2 - S 2 )t) 
e -2{sx+2rst)(l _ |5|2~) _|_ e 2(sx+2rst) |^|2 ' 

(For n-soliton formulas, see [FT] for the su(2) case and [TU2] for the su(n) case.) 

Recall that we have a natural bijection: Pq — > D_ — > V-/H-, where the first 
arrow is the Scattering Transform, ^ at , and the second is the natural coset projec- 
tion. Since we have a natural action of V- on its coset space 2?_/7Y_, this induces 

cont disc 

an action of T>- on P 0j and so the subgroups T>_ and T>_ also act on P . The 

disc 

orbit of under T>_ give the reflectionless potentials or pure solitons, while the 

orbit of under T>_ gives the potentials without poles. 

We can now at last explain how the notion of Backlund transformation fits 
into this picture; namely the action of the generators of T>_ on P are just 
the classical Backlund transformations. Typically they add one to the number of 
solitons in a solution. 
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